Advanced PostgreSQL Flashcards

Question

# How Do I Make Sure My Database Stays Intact? Isolation in ACID

Answer 1

When using multiple database transactions at once, the ACID property of Isolation ensures that no two transactions interact with each other at the same time. Should two transactions end up interacting, they will be performed sequentially as to maintain isolation.

Answer 2

Once a database transaction has finished its operations, it must ensure that its operations were fully committed. This act is the Durability part of ACID properties, and makes sure the data can be recovered should anything go wrong.

Answer 3

One of the downsides of creating an index in PostgreSQL is that indexes take up space. The index data structures can sometimes take up as much space as the database itself.

Answer 4

One of the downsides of creating an index in PostgreSQL is that indexes slow down data entry or modification. Whenever a new row is added that contains a column with an index, that index is modified as well. If you are adding a large amount of data to an existing table, it may be better to drop the index, add the data, and then recreate the index rather than having to update the index on each insertion.

Answer 5

In PostgreSQL, the DROP INDEX command can be used to drop an existing index. Indexes are dropped according to their name. ``` DROP INDEX IF EXISTS ; ```

Answer 6

In PostgreSQL, the keywords EXPLAIN ANALYZE can be used to get the query plan on for scripts. This can be used to see the runtime of a query. ``` EXPLAIN ANALYZE SELECT * FROM customers WHERE first_name = 'David'; ```

Answer 7

In a relational database like PostgreSQL, indexes are used to improve the speed of searching and filtering at the cost of slower inserts, updates, and deletes.

Answer 8

In PostgreSQL, multicolumn indexes allow for more than one column to be used in combination as an index on a table. The syntax to do this is identical to adding a single-column index, except multiple columns can be given in a comma-separated list. ``` CREATE INDEX customers_last_name_first_name_idx ON customers(last_name, first_name); ```

Answer 9

In PostgreSQL, to see the size of the database, you can use pg_size_pretty and pg_total_relation_size. This is a useful command to use before and after creating an index to see how much space the index is using. ``` SELECT pg_size_pretty (pg_total_relation_size('')); ```

Answer 10

In PostgreSQL, the pg_indexes table contains information about what indexes exist on a table. pg_indexes can be queried like any other table. ``` SELECT * FROM pg_indexes WHERE tablename = ''; ```

Answer 11

Indexes are used by the database server to increase the speed when searches for specific records are performed. This is often used in the WHERE clause(s) and when two tables are joined together on their ON clause(s). ``` SELECT * FROM customers WHERE last_name = 'Jones'; ```

Answer 12

In PostgreSQL, when a primary key is created on a table, the database server automatically creates a Unique Index on that table.

Answer 13

A PostgreSQL database can have two types of indexes - clustered and non-clustered. However, a table can only have one clustered index. This index physically changes the storage of the data in long term memory whereas a non-clustered index is a separate organization that references back to the original data.

Answer 14

A PostgreSQL database can have two types of indexes - clustered and non-clustered. However, a table can only have one clustered index. This index physically changes the storage of the data in long term memory whereas a non-clustered index is a separate organization that references back to the original data.

Answer 15

In PostgreSQL, a table can have multiple non-clustered indexes. These indexes create a key(s) and a pointer back to the table where the rest of the information can be found.

Answer 16

In PostgreSQL, the CLUSTER keyword can be used to create a new clustered index on a table, or recluster a table already setup with an index.

Answer 17

In PostgreSQL, if all columns being used in a query are part of an index, no secondary lookup is done.

Answer 18

PostgreSQL allows for indexing on a subset of a table using the WHERE clause. These are called Partial Indexes.

Answer 19

PostgreSQL can use indexes to return results in order without a separate step to sort. This is done by specifying the order (ASC or DESC) you want the index to be in when you create the index. -- Ascending order ``` CREATE INDEX ON ( ASC) ```

Answer 20

PostgreSQL can use multiple indexes together in a single query. This is done automatically by the system. A database engineer must consider whether to make multiple single indexes that are combined, a multicolumn index, or all combinations of single and multicolumn indexes.

Answer 21

A column Index is not limited to just a column reference, it can also be a function or scalar expression computed from one or more columns.

Answer 22

A scan search in a database is where every record in the table/view is searched to find the records requested by the query.

Answer 23

A seek search in a database is where the server jumps to specific records using an index.

Answer 24

A database server will try to use a seek search when it can, but it needs an index to work from that matches the search criteria. Additionally, the number of anticipated records must be a small enough subset of the total records in the table/view for the server to opt for a seek search.

Answer 25

When searching for a record in a database, the server will automatically pick a seek or a scan depending on which one it thinks will be faster in the given situation. While the programmer does not need to do anything to make this choice, they should be aware of which search is being used so they can examine if changes to the query or creation on an index might be beneficial.

Answer 26

Database normalization is a process by which database and table structures are created or modified in order to address inefficiencies/complexities related to the following: * data storage * data modification * querying database tables

Answer 27

Repeating groups of columns in a database table can create inefficiencies and errors related to data storage, querying, and modification. For example, consider a songs table with the following columns: * id * title * artist1_id * artist1_name * artist2_id * artist2_name The repeating artist-related columns likely contain duplicated data. It would also be difficult to sort this table by artist.

Answer 28

In a relational database, columns that are not dependent on the primary key of a table can create inefficiencies related to data storage and modification, while also increasing the potential for future data errors. This is often because columns that are not dependent on the primary key contain duplicated information. For example, in a books table with columns isbn, title, length, author_id, and author_name, the author-related columns will contain duplicated data if the same author has written multiple books; moving author-related information to a separate table would solve this problem.

Answer 29

If the same information is stored in multiple locations in a database table, a database manager needs to be careful when updating the table. For example, in the database table shown here, each customer’s email address is stored in multiple rows. Therefore, in order to update a customer email, multiple fields will need to be changed. Normalizing the table gets rid of duplicated data and therefore makes data errors less likely.

Answer 30

Problems can occur when updating a database table if new information needs to be inserted before the associated primary key is known. This can happen if columns are not dependent on the primary key. For example, consider a songs table with the following columns: * song_id (primary key) * title * length * artist_id * artist_name It would be impossible to add artist information to this table without also adding a value for song_id, which could be problematic.

Answer 31

The efficiency of any database schema is dependent on how the database is going to be used. While normalization solves many problems related to data storage, modification, and querying, it can also make some things more difficult. For example, tables in a normalized database will need to be joined back together if a query relies on information in multiple tables. It is therefore not always beneficial to normalize every database table. Decisions about schema design should be made with future use in mind!

Answer 32

A 1NF database is an atomic database. In this case, atomic means that each cell contains one value and each row is unique. In the given example, we can see that the non-atomic table has cells with more than one value and non-unique rows.

Answer 33

When a database is said to be 2NF, that means the database is both 1NF and contains no partial dependencies. A partial dependency is when an attribute depends partly on the table’s primary key.

Answer 34

When making a 3NF database, two goals need to be accomplished. The first being that the database is already 2NF, and the second being that the database contains no transitive functional dependencies. A transitive functional dependency is when a non-prime attribute is dependent on another non-prime attribute.

Answer 35

When updating data in a non-normalized database, sometimes not all of the data can get updated due to the lack of normalization. Another possible problem can be updating the wrong data. This is called an update anomaly and can be fixed by making sure the database has a higher level of normalization.

Answer 36

Sometimes, when working with a non-normalized database, incomplete data being added to the database can lead to NULL values existing within the database. This is called an insertion anomaly and can be prevented by making sure the database has a higher level of normalization.

Answer 37

When working with a non-normalized database, a deletion anomaly can occur. A deletion anomaly is when a query ends up deleting more data from the database than was intended due to a lack of normalization.1

Answer 38

When using PostgreSQL, the size of database tables can grow unexpectedly large with routine UPDATE and DELETE operations.

Answer 39

In PostgreSQL, when a row is deleted or updated, PostgreSQL creates so-called Dead tuples. Dead tuples are not referenced in the current version of our databases’ tables, but still occupy space on disk.

Answer 40

In PostgreSQL, to reclaim space from dead tuples, you can use VACCUUM, VACCUM ANALYZE, or VACCUM FULL, each comes with a different strategy for clearing dead tuples.

Answer 41

In PostgreSQL, It’s important to occasionally VACUUM tables to keep database queries performant and use database space efficiently.

Answer 42

In PostgreSQL, ANALYZE collects statistics about the contents of tables in the database, and stores the results in the system catalog so PostgreSQL can determine the efficient way to execute a query. ``` -- The statement to analyze a table named `schema.table`: ANALYZE schema.table; ```

Answer 43

In PostgreSQL, plain VACUUM can run in parallel with database operations, but VACUUM does not always fully reduce table sizes. Instead, it marks the space on disk as safe to overwrite with new data. ``` -- VACUUM `schemaname.tablename` with the below: VACUUM schemaname.tablename; ```

Answer 44

In PostgreSQL, VACUUM FULL should be used to fully reclaim database space. However, VACUUM FULL rewrites the entire contents of the table into a new location on disk with no extra space allocated. This is an expensive operation and should be used sparingly.

Answer 45

PostgreSQL has a feature called autovacuum, which automatically runs VACUUM and ANALYZE commands. When enabled, autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples.

Answer 46

In PostgreSQL, to improve performance of large deletes, TRUNCATE is preferable to DELETE, TRUNCATE is faster and automatically reclaims the space on disk.

Answer 47

In PostgreSQL, you can monitor table statistics by querying the view pg_stat_all_tables. This view contains statistics like number of dead and live tuples, number of rows inserted, and last vacuum or autovacuum time.

Advanced PostgreSQL Flashcards

(71 cards)