Advanced PostgreSQL Flashcards
How Do I Make Sure My Database Stays Intact?
Data Types As Constraints
Columns of a PostgreSQL database table must have a data type, which constrains the type of information that can be entered into that column. This is important in order to ensure data integrity and consistency over time. Some common PostgreSQL types are integer, decimal, varchar, and boolean. Data types are defined in a CREATE TABLE statement by indicating the data type after each column name.
How Do I Make Sure My Database Stays Intact?
Check Constraints
When using PostgreSQL, it can be important to enforce check constraints on columns of a database table in order to ensure data integrity and consistency over time. Check constraints can be enforced on a single column, multiple columns, or on all columns. They are implemented within a CREATE TABLE statement using the CHECK keyword.
How Do I Make Sure My Database Stays Intact?
Multiple Constraints
Columns in a database table can have multiple constraints. Multiple constraints can be implemented by listing them in a row following the relevant column name and data type in a CREATE TABLE statement. The order of the constraints does not matter.
How Do I Make Sure My Database Stays Intact?
NOT NULL
In PostgreSQL, NOT NULL constraints can be used to ensure that particular columns of a database table do not contain missing data. This is important for ensuring database integrity and consistency over time. NOT NULL constraints can be enforced within a CREATE TABLE statement using NOT NULL.
How Do I Make Sure My Database Stays Intact?
UNIQUE
In PostgreSQL, UNIQUE constraints can be used to ensure that elements of a particular column (or group of columns) are unique (i.e., no two rows have the same value or combination of values). This is important for ensuring database integrity and consistency over time. UNIQUE constraints can be enforced within a CREATE TABLE statement using the UNIQUE keyword.
How Do I Make Sure My Database Stays Intact?
Primary Key Constraint
In PostgreSQL, a primary key constraint indicates that a particular column (or group of columns) in a database table can be used to identify a unique row in that table. In terms of restrictions, this is equivalent to a UNIQUE NOT NULL constraint; however, a table may only have one primary key, whereas multiple columns can be constrained as UNIQUE NOT NULL. A primary key constraint can be enforced within a CREATE TABLE statement using PRIMARY KEY
How Do I Make Sure My Database Stays Intact?
Foreign Key Constraint
In PostgreSQL, a foreign key constraint ensures that the values in a particular column of a database table exactly match values in another database table. This is important to ensure the “referential integrity” of the database. A foreign key constraint can be enforced within a CREATE TABLE statement either by adding REFERENCES other_table_name (other_table_primary_key) after the relevant column name and type or using FOREIGN KEY (column_1, column_2) REFERENCES other_table_name (other_key1, other_key2) to indicate a link between groups of columns.
How Do I Make Sure My Database Stays Intact?
Cascade and Restrict
In PostgreSQL, when implementing foreign key constraints in a database table, it is possible to preemptively specify database behavior when values in a referenced column are updated or deleted. Specifying ON DELETE RESTRICT or ON UPDATE RESTRICT after a foreign key constraint ensures that referenced values/rows cannot be deleted/updated.
Specifying ON DELETE CASCADE or ON UPDATE CASCADE ensures that updated/deleted values/rows in the referenced table are automatically updated/deleted in the referencing table.
How Do I Make Sure My Database Stays Intact?
Updating A Table With Constraints
In PostgreSQL, when implementing a constraint on an existing table, the table must already be consistent with the constraint or PostgreSQL will reject the new constraint. A DB user may backfill the table using UPDATE or ALTER TABLE statements to make the table consistent with the constraint.
How Do I Make Sure My Database Stays Intact?
PostgreSQL’s default Superuser
PostgreSQL database clusters are created with a default role named Postgres. This is a special role that has SUPERUSER privileges and can modify all access restrictions within the database.
How Do I Make Sure My Database Stays Intact?
Principle of Least Privilege
The principle of least privilege says that applications and users should have the minimum permissions required for their function. In PostgreSQL, this means superusers should not be performing routine database tasks and specialized roles should be created for each user or application.
How Do I Make Sure My Database Stays Intact?
Checking User Permissions
In PostgreSQL, a user may check the system table, pg_catalog.pg_roles to understand what permissions users in the DB have. They may also check the information_schema.table_privileges table for more granular description of permissions at a table level.
How Do I Make Sure My Database Stays Intact?
PostgreSQL Roles
In PostgreSQL, roles are a collection of privileges that can be granted to one or more users.
How Do I Make Sure My Database Stays Intact?
SET into other roles in PostgreSQL
In PostgreSQL, a superuser can use SET ROLE to modify the current session to test the permissions of another user.
How Do I Make Sure My Database Stays Intact?
Creating Roles in PostgreSQL
In PostgreSQL, CREATE ROLE statements allow a user to create a new role. Keywords passed to CREATE ROLE like VALID UNTIL, LOGIN, IN GROUP, CREATEUSER, CREATEDB and PASSWORD allow for customization of the new user’s privileges.
– Create a role named miriam
with NOSUPERUSER
(is not a superuser) and LOGIN
(can log in to the DB)
CREATE ROLE miriam WITH NOSUPERUSER LOGIN;
How Do I Make Sure My Database Stays Intact?
GRANTING and REVOKING Database Privileges
In PostgreSQL, GRANT and REVOKE statements can be used to modify what privileges users have on an existing database object (e.g. schemas, databases, sequences, functions, etc.)
How Do I Make Sure My Database Stays Intact?
ALTER DEFAULT Permissions and Privilege
In PostgreSQL, ALTER DEFAULT statements can be used to set privileges on all new objects in a database or schema, this statement does not affect existing objects.
– Alter the default permissions to a schema, combine ALTER DEFAULT
with a GRANT
statement
ALTER DEFAULT PRIVILEGES IN SCHEMA finance GRANT SELECT ON TABLES TO analyst;
How Do I Make Sure My Database Stays Intact?
Creating a Group Role
In PostgreSQL, group roles are used to distribute permissions to sets of users. A CREATE ROLE statement that includes WITH ROLE name name name…. can be used to create a group comprised of a set of existing roles.
– Create a group role named marketing
with alice
and bob
as members.
CREATE ROLE marketing WITH NOLOGIN ROLE alice, bob;
How Do I Make Sure My Database Stays Intact?
Groups and Roles in PostgreSQL
In PostgreSQL, because groups are roles themselves, you can create a role and GRANT access to that group. The privileges granted to that group will also apply to the group’s members.
How Do I Make Sure My Database Stays Intact?
GRANT Permissions for subsets of columns
In PostgreSQL, GRANT statements can be applied to specific columns in a PostgreSQL table to limit the permissions a given user can access or modify.
How Do I Make Sure My Database Stays Intact?
PostgreSQL Row Level Security
In PostgreSQL, row level security is a pattern that only grants access to certain rows of a relation to a given role/group. Row level security can be implemented with CREATE POLICY and ENABLE ROW LEVEL SECURITY statements.
How Do I Make Sure My Database Stays Intact?
Database Transaction with ACID Properties
When working with a database, a single unit of work is defined as a transaction. While this unit of work can vary between databases in terms of its complexity, every transaction must maintain ACID properties. This allows for concurrent use of the database and helps with recovery in the case of a system failure.
How Do I Make Sure My Database Stays Intact?
Atomicity in ACID
When working with a database transaction, the ACID property Atomic means that if any part of the transaction fails, then the entire transaction will fail as well. This means that the database would remain unchanged once the transaction has finished if any one part fails.
How Do I Make Sure My Database Stays Intact?
Consistency in ACID
In most databases, certain constraints and triggers exist within it to ensure that all of the data is consistent. Transaction within the database must also follow these rules to maintain Consistency in the ACID properties.
How Do I Make Sure My Database Stays Intact?
Isolation in ACID
When using multiple database transactions at once, the ACID property of Isolation ensures that no two transactions interact with each other at the same time. Should two transactions end up interacting, they will be performed sequentially as to maintain isolation.
How Do I Make Sure My Database Stays Intact?
Durability in ACID
Once a database transaction has finished its operations, it must ensure that its operations were fully committed. This act is the Durability part of ACID properties, and makes sure the data can be recovered should anything go wrong.
How Do I Make Sure My Database Stays Fast?
A Downside of Indexes: Space
One of the downsides of creating an index in PostgreSQL is that indexes take up space. The index data structures can sometimes take up as much space as the database itself.
How Do I Make Sure My Database Stays Fast?
A Downside of Indexes: Data Entry and Updates
One of the downsides of creating an index in PostgreSQL is that indexes slow down data entry or modification. Whenever a new row is added that contains a column with an index, that index is modified as well. If you are adding a large amount of data to an existing table, it may be better to drop the index, add the data, and then recreate the index rather than having to update the index on each insertion.
How Do I Make Sure My Database Stays Fast?
Dropping an Index
In PostgreSQL, the DROP INDEX command can be used to drop an existing index. Indexes are dropped according to their name.
DROP INDEX IF EXISTS <index_name>;
How Do I Make Sure My Database Stays Fast?
EXPLAIN ANALYZE
In PostgreSQL, the keywords EXPLAIN ANALYZE can be used to get the query plan on for scripts. This can be used to see the runtime of a query.
EXPLAIN ANALYZE SELECT * FROM customers WHERE first_name = 'David';
How Do I Make Sure My Database Stays Fast?
Benefits of an Index
In a relational database like PostgreSQL, indexes are used to improve the speed of searching and filtering at the cost of slower inserts, updates, and deletes.
How Do I Make Sure My Database Stays Fast?
Multicolumn Indexes
In PostgreSQL, multicolumn indexes allow for more than one column to be used in combination as an index on a table.
The syntax to do this is identical to adding a single-column index, except multiple columns can be given in a comma-separated list.
CREATE INDEX customers_last_name_first_name_idx ON customers(last_name, first_name);
How Do I Make Sure My Database Stays Fast?
Database Size
In PostgreSQL, to see the size of the database, you can use pg_size_pretty and pg_total_relation_size.
This is a useful command to use before and after creating an index to see how much space the index is using.
SELECT pg_size_pretty (pg_total_relation_size('<table_name>'));
How Do I Make Sure My Database Stays Fast?
The pg_index Table
In PostgreSQL, the pg_indexes table contains information about what indexes exist on a table. pg_indexes can be queried like any other table.
SELECT * FROM pg_indexes WHERE tablename = '<table_name>';
How Do I Make Sure My Database Stays Fast?
Indexes with WHERE
Indexes are used by the database server to increase the speed when searches for specific records are performed. This is often used in the WHERE clause(s) and when two tables are joined together on their ON clause(s).
SELECT * FROM customers WHERE last_name = 'Jones';
How Do I Make Sure My Database Stays Fast?
Indexes and Primary Keys
In PostgreSQL, when a primary key is created on a table, the database server automatically creates a Unique Index on that table.
How Do I Make Sure My Database Stays Fast?
Clustered Indexes
A PostgreSQL database can have two types of indexes - clustered and non-clustered.
However, a table can only have one clustered index. This index physically changes the storage of the data in long term memory whereas a non-clustered index is a separate organization that references back to the original data.
How Do I Make Sure My Database Stays Fast?
Clustered Indexes
A PostgreSQL database can have two types of indexes - clustered and non-clustered.
However, a table can only have one clustered index. This index physically changes the storage of the data in long term memory whereas a non-clustered index is a separate organization that references back to the original data.
How Do I Make Sure My Database Stays Fast?
Non-Clustered Indexes
In PostgreSQL, a table can have multiple non-clustered indexes. These indexes create a key(s) and a pointer back to the table where the rest of the information can be found.
How Do I Make Sure My Database Stays Fast?
The CLUSTER Keyword
In PostgreSQL, the CLUSTER keyword can be used to create a new clustered index on a table, or recluster a table already setup with an index.
How Do I Make Sure My Database Stays Fast?
Avoiding Secondary Lookup
In PostgreSQL, if all columns being used in a query are part of an index, no secondary lookup is done.
How Do I Make Sure My Database Stays Fast?
Partial Indexes
PostgreSQL allows for indexing on a subset of a table using the WHERE clause. These are called Partial Indexes.
How Do I Make Sure My Database Stays Fast?
Ordered Indexes
PostgreSQL can use indexes to return results in order without a separate step to sort. This is done by specifying the order (ASC or DESC) you want the index to be in when you create the index.
– Ascending order
CREATE INDEX <index_name> ON <table_name> (<column_name> ASC)
How Do I Make Sure My Database Stays Fast?
Combining Indexes
PostgreSQL can use multiple indexes together in a single query. This is done automatically by the system. A database engineer must consider whether to make multiple single indexes that are combined, a multicolumn index, or all combinations of single and multicolumn indexes.
How Do I Make Sure My Database Stays Fast?
Indexes With Functions
A column Index is not limited to just a column reference, it can also be a function or scalar expression computed from one or more columns.
How Do I Make Sure My Database Stays Fast?
Database Scan
A scan search in a database is where every record in the table/view is searched to find the records requested by the query.
How Do I Make Sure My Database Stays Fast?
Database Seek
A seek search in a database is where the server jumps to specific records using an index.
How Do I Make Sure My Database Stays Fast?
Database Seek vs Scan Preference
A database server will try to use a seek search when it can, but it needs an index to work from that matches the search criteria. Additionally, the number of anticipated records must be a small enough subset of the total records in the table/view for the server to opt for a seek search.
How Do I Make Sure My Database Stays Fast?
Database Seek vs Scan Control
When searching for a record in a database, the server will automatically pick a seek or a scan depending on which one it thinks will be faster in the given situation. While the programmer does not need to do anything to make this choice, they should be aware of which search is being used so they can examine if changes to the query or creation on an index might be beneficial.
Normalizing a Database
Database Normalization
Database normalization is a process by which database and table structures are created or modified in order to address inefficiencies/complexities related to the following:
* data storage
* data modification
* querying database tables
Normalizing a Database
Repeating Column Groups
Repeating groups of columns in a database table can create inefficiencies and errors related to data storage, querying, and modification. For example, consider a songs table with the following columns:
- id
- title
- artist1_id
- artist1_name
- artist2_id
- artist2_name
The repeating artist-related columns likely contain duplicated data. It would also be difficult to sort this table by artist.
Normalizing a Database
Independent Columns
In a relational database, columns that are not dependent on the primary key of a table can create inefficiencies related to data storage and modification, while also increasing the potential for future data errors. This is often because columns that are not dependent on the primary key contain duplicated information. For example, in a books table with columns isbn, title, length, author_id, and author_name, the author-related columns will contain duplicated data if the same author has written multiple books; moving author-related information to a separate table would solve this problem.
Normalizing a Database
Updating Duplicated Data
If the same information is stored in multiple locations in a database table, a database manager needs to be careful when updating the table. For example, in the database table shown here, each customer’s email address is stored in multiple rows. Therefore, in order to update a customer email, multiple fields will need to be changed. Normalizing the table gets rid of duplicated data and therefore makes data errors less likely.
Normalizing a Database
Inserting Data Without a Primary Key
Problems can occur when updating a database table if new information needs to be inserted before the associated primary key is known. This can happen if columns are not dependent on the primary key. For example, consider a songs table with the following columns:
- song_id (primary key)
- title
- length
- artist_id
- artist_name
It would be impossible to add artist information to this table without also adding a value for song_id, which could be problematic.
Normalizing a Database
Database Efficiency and Use
The efficiency of any database schema is dependent on how the database is going to be used. While normalization solves many problems related to data storage, modification, and querying, it can also make some things more difficult. For example, tables in a normalized database will need to be joined back together if a query relies on information in multiple tables. It is therefore not always beneficial to normalize every database table. Decisions about schema design should be made with future use in mind!
Normalizing a Database
1NF Databases
A 1NF database is an atomic database. In this case, atomic means that each cell contains one value and each row is unique. In the given example, we can see that the non-atomic table has cells with more than one value and non-unique rows.
Normalizing a Database
A 2NF Database
When a database is said to be 2NF, that means the database is both 1NF and contains no partial dependencies. A partial dependency is when an attribute depends partly on the table’s primary key.
Normalizing a Database
A 3NF database
When making a 3NF database, two goals need to be accomplished. The first being that the database is already 2NF, and the second being that the database contains no transitive functional dependencies. A transitive functional dependency is when a non-prime attribute is dependent on another non-prime attribute.
Normalizing a Database
Update Anomalies
When updating data in a non-normalized database, sometimes not all of the data can get updated due to the lack of normalization. Another possible problem can be updating the wrong data. This is called an update anomaly and can be fixed by making sure the database has a higher level of normalization.
Normalizing a Database
Insertion Anomalies
Sometimes, when working with a non-normalized database, incomplete data being added to the database can lead to NULL values existing within the database. This is called an insertion anomaly and can be prevented by making sure the database has a higher level of normalization.
Normalizing a Database
Deletion Anomalies
When working with a non-normalized database, a deletion anomaly can occur. A deletion anomaly is when a query ends up deleting more data from the database than was intended due to a lack of normalization.1
Database Maintenence
Updates and Deletes Effect on Table Size
When using PostgreSQL, the size of database tables can grow unexpectedly large with routine UPDATE and DELETE operations.
Database Maintenence
PostgreSQL Dead Tuples
In PostgreSQL, when a row is deleted or updated, PostgreSQL creates so-called Dead tuples. Dead tuples are not referenced in the current version of our databases’ tables, but still occupy space on disk.
Database Maintenence
PostgreSQL Vacuuming
In PostgreSQL, to reclaim space from dead tuples, you can use VACCUUM, VACCUM ANALYZE, or VACCUM FULL, each comes with a different strategy for clearing dead tuples.
Database Maintenence
Importance of VACUUM
In PostgreSQL, It’s important to occasionally VACUUM tables to keep database queries performant and use database space efficiently.
Database Maintenence
PostgreSQL Analyze
In PostgreSQL, ANALYZE collects statistics about the contents of tables in the database, and stores the results in the system catalog so PostgreSQL can determine the efficient way to execute a query.
-- The statement to analyze a table named `schema.table`: ANALYZE schema.table;
Database Maintenence
VACUUM in PostgreSQL
In PostgreSQL, plain VACUUM can run in parallel with database operations, but VACUUM does not always fully reduce table sizes. Instead, it marks the space on disk as safe to overwrite with new data.
-- VACUUM `schemaname.tablename` with the below: VACUUM schemaname.tablename;
Database Maintenence
VACUUM FULL in PostgreSQL
In PostgreSQL, VACUUM FULL should be used to fully reclaim database space. However, VACUUM FULL rewrites the entire contents of the table into a new location on disk with no extra space allocated. This is an expensive operation and should be used sparingly.
Database Maintenence
Vacuum and Autovacuum in PostgreSQL
PostgreSQL has a feature called autovacuum, which automatically runs VACUUM and ANALYZE commands. When enabled, autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples.
Database Maintenence
PostgreSQL Truncate
In PostgreSQL, to improve performance of large deletes, TRUNCATE is preferable to DELETE, TRUNCATE is faster and automatically reclaims the space on disk.
Database Maintenence
PostgreSQL All Table Statistics
In PostgreSQL, you can monitor table statistics by querying the view pg_stat_all_tables. This view contains statistics like number of dead and live tuples, number of rows inserted, and last vacuum or autovacuum time.