Basic Administration Flashcards
Name some security features
- SSL
- host-based authentication
- object-level permissions
- logging
- groups & roles
Name some recovery and availability features
- Streaming replication (async and sync), aka hot standby
- Cascading streaming replication
- pg_basebackup, hot backup
- PITR (point in time recovery)
Name some cool advanced features
- triggers and functions
- many procedural languages (pgSQL, Perl, TCL, PHP, Java, Python, etc)
- custom procedural languages
- upgrade using pg_upgrade
- unlogged tables
- materialized views
What does MVCC stand for
Multi-Version Concurrency Control
List some characteristics of MVCC
- Maintains data consistency.
- Each transaction sees a snapshot of the database as it was at the beginning of the transaction.
- Writers block writers; nothing else blocks.
- Concurrent transactions are isolated.
- Prevents transactions from seeing inconsistent data.
What does WAL stand for?
Write-Ahead Logs (or Write-Ahead Logging)
How does WAL work?
- Records each data change before it actually takes place.
- Data is not considered ‘safe’ until the log is written to disk.
- Provides recovery in case of crash or failure.
- As DML is executed, changes are recorded in memory in WAL buffers (as well as the shared data buffers).
- When COMMIT is executed, the WAL buffers are flushed to the WAL segment files
- Later the dirty data buffers are written to disk
What is ACID?
Atomicity, Consistency, Isolation, Durability
List some commonly used pg_dump switches
- a Data only
- s Definitions (“schema”) only
- n NAME Dump named schema (“namespace”)
- t NAME Dump named table
- Fp Plain format
- Fc Custom format
- Fd Directory format
- Ft Tar format
- f FILE Dump to named file
- o Dump OIDs
- j JOBS Parallelize directory format dumps
- v Verbose
Name four general methods for backup
- SQL dump (pg_dump, pg_dumpall)
- Filesystem dump
- Continuous archiving
- Streaming replication
List two methods (command line examples) for dealing with very large databases when doing SQL backups
- pg_dump … | gzip -c > file.sql.gz
* pg_dump … | split -b 1m - filename
Describe two methods for restoring SQL dumps done with pg_dump
- psql < TEXT_FORMAT_DUMP_FILE (or psql -f TEXT_FORMAT_DUMP_FILE)
- pg_restore NON_TEXT_FORMAT_DUMP_FILE
Using command defaults for pg_dump, when you restore a backup into a new cluster, is there anything you need to do first?
Yes, create the target database. By default, pg_dump does not include the command to create the target database.
What does the –create (-C) switch to pg_dump do?
pg_dump --create
inserts a command to create and reconnect to the target database before the commands to populate that database.
What does the –clean switch to pg_dump do?
pg_dump --clean
drops the target database prior to recreating it, when using the –create (-C) switch.
List important pg_restore options
pg_restore
- d DB Connect to the specified database, and, if -C (–create) is not specified, restore into this database also.
- C Create the database specified in the backup and restore into it
- a Restore data only (assumes the SQL objects like tables have already been created)
- s Restore object definitions (DDL) only, not data
- t NAME Restore named table
- n NAME Restore named schema/namespace
- v Verbose
What command can dump an entire cluster as SQL?
pg_dumpall
What does pg_dumpall do?
pg_dumpall dumps all databases in the cluster, and also global objects such as roles and tablespaces.
List commonly used pg_dumpall switches
pg_dumpall
- a Dump data only
- s Dump definitions only
- c Clean (drop) objects before recreating
- g Dump global objects (roles, etc) only, not databases
- r Dump roles only, not databases or other global objects
- O Skip ownership commands
- x Skip privileges
- -disable-triggers Disable triggers during data-only restore
- v Verbose
How many times does pg_dumpall ask for authentication credentials?
Once per database (and more?)
Why is it a good idea to use a pgpass file when using pg_dumpall?
A pgpass file is a good idea for pg_dumpall because the user’s credentials are requested once per database in the cluster.
List some characteristics of SQL dumps as a backup mechanism
- Generate a text file containing SQL commands (or a binary representation thereof)
- pg_dump is the relevant command (or pg_dumpall for the entire cluster)
- pg_dump does not block readers or writers
- pg_dump does not operate with special permissions
- pg_dump dumps are internally consistent and are a snapshot of the database at the time pg_dump begins running
Next to SQL dump, what is the next most simple backup approach?
Simple file system level backup is an alternative to SQL dumps, provided that either 1) the database is shut down during the backup, or 2) the native snapshot feature of the filesystem is used (if available).
Why mightn’t filesystem snapshots work without database downtime, even if your filesystem supports this feature?
Filesystem snapshots might not work if the database is spread across multiple filesystems; if not, you would have to stop the database to take the multiple snapshots.
Describe the process of restoring a backup made via a filesystem snapshot
A filesystem snapshot backup can be restored by 1) copying the backup into place (including the WAL logs) and 2) starting the database, which will go into crash recovery mode and replay WAL logs.
How can you take your filesystem snapshot backup so that the restore is as fast as possible?
To minimize restoration time of a filesystem snapshot, force a CHECKPOINT right before the snapshot – this will minimize the amount of WAL logs that have to be replayed.
Can you use tar to back up a database? If so, describe how.
Yes, you can back up a PostgreSQL database by 1) shutting down the database; 2) using tar to copy all of the files in the database cluster; and 3) restarting the database.
How would you use rsync to backup a database?
You can back up a database using rsync as follows: 1) With the database running, use rsync to copy all the files. 2) Shut down the database. 3) Use rsync again to copy the files; this second rsync will create a consistent image of the database and will be quite fast, minimizing downtime.
After SQL dumps and simple filesystem backups, what is a third , more complicated mechanism for backups?
A third backup approach is continuous archiving of WAL logs (combined with a possibly inconsistent filesystem backup, such as produced by tar, even with the database running).
What is another name for continuous archiving as used for backup purposes?
Online backup
What is the primary purpose of WAL logs?
The primary purpose of WAL logs is to allow database commits to happen quickly (without the data being fully written to the final data pages) but to prevent loss of information in case of a crash – the WAL logs can be “played” when the database starts up after a crash, thus restoring the physical database to match its logical state at the time of the crash.
What is a secondary use to which WAL logs can be put?
Online backup – first, get WAL log archiving started, in which full and switched WAL logs are copied to backup storage before being recycled; then take a file system backup while the database is running (tar, rsync, etc). If the database crashes, or if you want to revert the database to a specified point in time (PIT), you can copy the original full backup into place along with the archived WAL files, and start the database.
How do you enable continuous archiving of WAL files?
- wal_level must be set to
archive
orhot_standby
(as opposed to, say,minimal
). - archive_mode must be set to
on
(default isoff
). - archive_command must be defined – a command to copy the WAL files somewhere.
How can you achieve point in time recovery (PITR) with continuous archiving?
PITR can be accomplished by having the database replay the WAL files only up to a specified file, not all the way to the last file.
How can you use continuous archiving to achieve warm standby?
Have a second server loaded with the base backup file (filesystem-level backup), and feed the archived WAL files to this second server. At any time, the WAL files can be replayed on this second server so that the second server can take over from the first with a nearly identical state.
Why can’t pg_dump and pg_dumpall be used with online backup via continuous archiving?
pg_dump and pg_dumpall produce logical, not physical backups of the database. They don’t capture enough information for the WAL files to
What is the default size of WAL segment files?
16MB – this can be changed by recompiling PostgreSQL
How are WAL segment files named?
Numerically, according to the position in the abstract WAL sequence. I say abstract sequence because there is only a small number of physical files, which are recycled by renaming when a particular WAL file is no longer needed (it has been checkpointed; i.e. the changes it encodes have been reflected to the actual data pages).
What does the simplest useful archive_command value on Unix systems?
archive_command = ‘test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f’
What does %f mean in the archive_command?
%f is the base file name of the WAL file
What does %p mean in the archive_command?
%p is the full path name of the WAL file
How do you specify a literal percent sign in the archive_command?
%%
Describe the return value/exist status of the archive_command
0 if the file could be copied successfully; otherwise, non-zero.
What happens if the archive_command starts returning non-zero for every file (perhaps because of network error, unmounted fs, etc)?
The pg_xlog directory will continue to fill up with WAL files. If the containing filesystem fills up, PostgreSQL will do a “panic” shutdown – no committed transactions will be lost, however.
Does continuous archiving back up changes made to postgresql.conf, pg_hba.conf, etc?
No, you must have another approach to back up changes to the configuration files in the data directory.
If you exclude the PostgreSQL data directory from the server’s normal backup procedures, how can the config files be continuously backed up (since continuous archiving does not do so)?
Start PostgreSQL with -D config_dir
, where config_dir is a directory outside of the data directory, some place where the normal operating system backup procedures will back the directory up. Then, in postgresql.conf within this directory, use the data_directory
parameter to point to the actual data directory.
How could you temporarily turn off WAL archiving without stopping the server?
Set archive_command to the empty string and reload the server – WAL files will start to accumulate in the pg_xlog directory, though.
What is pg_basebackup?
pg_basebackup is a command for taking a base backup of the live database, to which you would apply archived WAL files in order to recover from a disaster.
Describe pg_receivexlog
- streams transaction logs from a running cluster
- uses streaming replication protocol
- these files can be used for PITR
- logs streamed in real time
- can be used instead of archive command
- example: pg_receivexlog -h localhost -D /usr/local/pgsql/archive
Describe how the archive_command `cp -i %p /mnt/server/archivedir/%f
Uses the Unix file copy (cp) command to copy the source WAL file (%p - path) to the desired destination path (/mnt/server/archivedir/%f). The twist is that the interactive (-i) switch is used, presumably to prevent an already existing file of the same name from being overwritten. “But wait”, you say, “how can this command be interactive? There’s no human involved!” That’s where the redirection from /dev/null comes in – if a file would be overwritten, the prompt is written, and the empty response causes an error (status 1), which is what we would want in this scenario.
Describe the “low level API” for base backup that is an alternative to using the pg_basebackup command
- Connect with psql and issue the command “select pg_start_backup(‘backup label’);”
- Back up the data directory using tar or rsync, etc.
- Now issue the SQL command “select pg_stop_backup()”.
[Note: no label is required for the stop; it’s assumed (or enforced) that there is only one backup happening at a time.]
What CLI command can be used to take a base backup of a live cluster?
pg_basebackup
Can a backup taken by pg_basebackup be used for PITR?
Yes
Can a backup taken by pg_basebackup be used for streaming replication?
Yes
What does pg_basebackup do, exactly?
It uses the low-level backup API (pg_start_backup(‘label’) and pg_stop_backup()) wrapped around some binary copying mechanism, i.e. it automatically puts the database in and out of backup mode.
List important pg_basebackup switches/options
- D - destination for backup
- F <p> - format: plain or tar
- X - include log files created during the backup, so the backup is fully usable
- z - compress with gzip
- Z - compression level
- P - enable progress reporting
- h - host on which cluster is running
- p - cluster port</p>
Describe what this command is doing: pg_basebackup -h localhost -D /usr/local/pgsql/backup
pg_basebackup is being used to take a base backup of the cluster running on localhost; the files will be written to /usr/local/pgsql/backup/.
Describe the configuration steps required to use pg_basebackup
- Modify pg_hba.conf to add a replication connection, e.g.: “host replication postgres IP_ADDR/32 trust”
- archive_command = ‘cp -i %p /dest/dir/%f
- archive_mode = on # requires restart
- max_wal_senders = 3
wal_keep_segments = NUM
wal_level = archive
Describe the steps to performing (Point-In-TIme) Recovery
- Stop the server (if it is running).
- If you have enough space, keep a copy of the data directory and the transaction logs.
- Remove all directories and files from the cluster data directory.
- Restore the database files from the base backup (file system backup).
- Verify the ownership of restored backup directories (must not be root).
- Remove any files in pg_xlog/.
- If you have any unarchived WAL segment files recovered from the crashed cluster, copy them in pg_xlog.
- Create a recovery command file recovery.conf in the cluster data directory.
- Start the server.
- Upon completion of the recovery process, the server will rename recovery.conf to recovery.done.
List the settings (with types) in the recovery.conf file used for point-in-time recovery
restore_command (string) # unix: restore_command = 'cp /archive/dir/%f "%p"' # windows: restore_command = 'copy c:\\archive\\\dir\\"%f" "%p"' recovery_target_name (string) recovery_target_time (timestamp) recovery_target_xird (string) recovery_target_inclusive (boolean) recovery_target_timeline (string) pause_at_recovery_target (boolean)
Why might one choose to set archive_timeout
to something other than the default of 0?
When doing WAL archiving, remember that only full (16MB) WAL files are shipped, so if the transaction rate and volume are low, you could be exposed to losing data in case of catastrophe. The archive_timeout forces a WAL log to be shipped after the specified number of seconds has passed, regardless of whether it is full or not. If the value is too low (but not zero), it could lead to WAL bloat in the archive.
Describe what the pg_xlogdump contrib module and how to use it
- pg_xlogdump displays the WAL in human-readable format
- can give wrong results when the server is running
- Syntax: pg_xlogdump [[startseg] [endseg]]
What is the name of the contrib module that can display the WAL in a human-readable format
pg_xlogdump
What are the two methods for upgrading PostgreSQL?
- ) Old-school: pg_dump and then restore into the new cluster
- ) New-school: pg_upgrade
List important features of pg_upgrade
- Helps upgrade between major releases of PostgreSQL (that would ordinarily require dump and restore)
- Supports upgrading PG 8.3.X or later to latest version
- Verifies old and new clusters are binary compatible
- Option for checking clusters (?)
- Can do in-place or side-by-side upgrade
- Side-by-side upgrade requires double storage
- Can be done with parallel jobs in PG 9.3+
List important pg_upgrade options/switches
- b old_bindir
- B new_bindir
- d old_datadir
- D new_datadir
- p old_port
- P new_port
- c check clusters only (no change)
- j # of jobs
- k use hard links instead of copying files
- r retain SQL and log files even after successful completion
- u user_name
What PG architecture component creates execution plans?
Planner
What PG architecture component chooses the most efficient plan for a query?
Optimizer
List some execution plan components
row estimates (cardinality) access method: sequential or index join method: hash, nested loop, etc join type and order sorting and aggregates
Describe EXPLAIN usage
EXPLAIN (options) statement where option can be one of: ANALYZE VERBOSE COSTS BUFFERS TIMING FORMAT {TEXT|XML|JSON|YAML}
In the explain output:
seq scan on emp (cost=X..Y rows=R width=W) what are X, Y, R, and W?
X=estimated startup cost
Y=estimated total cost
R=estimated rows output
W=estimated average width in bytes of rows output
Plans are made and costed using table statistics - what are two primary table statistics?
Table statistics in pg_class include retuples, the estimated number of rows in a table, and relpages, the estimated number of disk blocks taken up by a table or index.
Are table statistics updated in real time?
No, table statistics are updated using ANALYZE or VACUUM ANALYZE
What tables are the table statistics stored in?
pg_class and pg_statistic store the table statistics; pg_stats is a view on pg_statistic that is more commonly used.
What gets analyzed when you run ‘ANALYZE’ (with no options)?
The tables and indexes in the current database that the current user owns (is that right?)
Can you ANALYZE just a single table or column
Yes - “ANALYZE some_table” or “ANALYZE some_table(some_col)”
Does autovacuum run ANALYZE
Yes, autovacuum runs ANALYZE by default.
Does a DELETE immediately remove a row?
No, DELETE merely marks a row as deleted; the row space can be reused or removed by VACUUM .
What are four useful functions of VACUUM?
- Can recover or reuse space occupied by obsolete rows (from deletes and updates)
- Can (via ANALYZE option) update data statistics
- Updates the visibility map, which speeds up index-only scans
- Protects against loss of very old data due to transaction ID wraparound
What is the visibility map useful for?
It makes index-only scans more efficient (and more likely to be chosen by the query planner) - the full heap tuple doesn’t need to be read in order to determine whether an index entry is valid or not.
What does plan VACUUM do (as opposed to VACUUM full)?
- removes dead rows and marks the space available for future reuse.
- Does not shrink the file except for dead rows at the end of the table
What does VACUUM FULL do that plain vacuum does not do?
- More aggressive algorithm (what does this mean?)
- Rewrites the entire table with no dead space
- Takes a hell of a lot more time
- Requires at least twice the disk space of the original copy of the table
Describe the problem of transaction ID wraparound failure?
Each transaction has a 32-bit ID (XID) that is allocated serially, and each row version is marked with the XID of the transaction that created it. A transaction is not allowed to see rows with XIDs that are larger than its own, because these are “in the future”. But because there is a limit to how large XIDs can be (2^32, approx 4 billion), at some point XIDs wrap around to 0, at which point a row with a new, low XID is seen as older than higher XIDs, when actually it is newer. Vacuuming can “freeze” old rows by assigning them a special XID that means “old, visible by all”. This prevents wraparound failure, although I am not sure about the details.
How often must a table be vacuumed to avoid transaction ID wraparound failure?
At least once every two billion transactions.
What is the visibility map?
The visibility map for a table keeps track of which pages contain only tuples that are visible to all current and future transactions (until the page is modified, anyway). This has two benefits: 1) it helps vacuum avoid looking at pages unnecessarily, and 2) it allows index scans to avoid grabbing the heap tuple merely to determine if the current transaction is allowed to see the tuple for an index entry. The VM is tiny compared to the table/heap, so for very large tables, the cost savings can be significant. This allows “index-only scans” to be used.
What is the name of the file that contains the visibility map for a table?
_vm
How does the visibility map get updated?
VACUUM updates it.
What can you say about the size of the visibility map of a table?
The visibility map is very small, so it is readily cached, and many index entries can be checked for visibility with very little memory or disk I/O.
When do you need to run REINDEX?
REINDEX should be run when:
- an index is corrupted (rare)
- an index is bloated (many almost pages)
- a storage parameter for the index (like fillfactor) has been changed
- an index build with CONCURRENTLY failed, leaving an invalid index
What is the syntax for REINDEX?
REINDEX {INDEX|TABLE|DATABASE|SYSTEM} name [FORCE]
Where is meta-information about tables and other objects stored?
The pg_catalog schema is where system information about a database is stored.
What sorts of objects are stored in pg_catalog?
The following objects are stored in the pg_catalog schema:
- System tables (like pg_class)
- System functions (e.g. pg_database_size)
- System views (pg_stat_activity)
Is pg_catalog always part of the search path?
Yes, pg_catalog is effectively part of the search_path.
What psql command lists system information tables and views?
The \dS psql command lists tables and views from the pg_catalog schema (in addition to other tables and schemas in the search path).
What are the system catalog tables holding tables, constraints, indexes, triggers, and views?
pg_tables, pg_constraint (no s), pg_indexes, pg_trigger (no s), and pg_views - (thanks for the consistency, guys!) These are all views except for pg_constraint.
What is the system function for showing the database to which you are connected?
current_database() is the system function for showing the database to which you are connected
What is the system function for showing the first schema in the search path?
current_schema() is the system function for showing the first schema in the search path.
What are the system functions for showing the client IP address and port and server address and port for the current connection?
inet_client_addr, inet_client_port, inet_server_addr, inet_server_port
What is the system function for telling how long the cluster has been up?
pg_postmaster_start_time
What are the columns in the \dS output?
schema, relation name, relation type, and owner
What extra columns are shown by \dS+?
size and description (the latter is empty for system objects)
What is the difference between the session_user and current_user/user system functions?
session_user() is analogous to the UNIX “real user”, current_user() to the “effective user”
What system function returns the array of schemas in the search path?
current_schemas(boolean)
What does the boolean parameter to the current_schemas() function do?
If true, the schemas implicitly in the search path (usually pg_catalog) are also included. If false, just the schemas from the normal explicit search_path are included.
What is the system function for returning the value of a configuration parameter/variable?
current_setting(setting_name_str)
What is the system function for modifying a configuration variable?
set_config()
What SQL command is the current_setting() function equivalent to?
SHOW
What SQL command is the set_config() function equivalent to?
SET
What system function cancels the current query of a backend process?
pg_cancel_backend(pid) cancels the current query in a backend process - the argument is pid
What system function terminates/kills a backend process?
pg_terminate_backend(pid)
What is the usage for the system function set_config()?
set_config(setting, new_value, is_local_to_transaction) (if not transaction-specific, then applies to the session).
What is the system function for reloading the PostgreSQL config files?
pg_reload_conf() reloads the configuration files
What system function rotates the server’s log file?
pg_rotate_logfile
What two functions are used to start and stop online backup?
pg_start_backup(label, [fast]) and pg_stop_backup()
What is the function for determining the disk space used by a tablespace?
pg_tablespace_size(name_or_oid)
What is the function for determining the disk space used by a database?
pg_database_size(name_or_oid)
What is the function for determining the disk space used by a relation, not including indexes and toasted data?
pg_relation_size(name_or_oid)
What is the function for determining a relation’s TOTAL size, including indexes and TOAST data?
pg_total_relation_size(name_or_oid)
Is the size reported in the size column in the \d+ output the same as pg_relation_size() or pg_total_relation_size()?
Neither one!
What system function reports the number of bytes required to store the given operand?
pg_column_size(something)
List some file operation functions
pg_ls_dir, pg_read_file, pg_stat_file
What system function can be used to return filenames in a specified directory?
pg_ls_dir(dir_relative_to_data_dir) - superuser only. E.g.: select pg_ls_dir(‘.’) lists all files in the data directory
What system function can read a file
pg_read_file(path) reads a file, one line per row
What psql command can you use to determine a function’s usage?
\df func_name
What system view can you use to see details of open connections and running transactions?
pg_stat_activity
What system view can you use to see the list of current_locks being held?
pg_locks
What system view can you use to see details on databases?
pg_stat_database
What system view can you use to see usage statistics for tables?
pg_stat_user_tables
What system view can you use to see usage statistics for indexes?
pg_stat_user_indexes
What system view can you use to see usage statistics for functions?
pg_stat_user_functions
Write a query to find the list of schemas currently in your search path
-- Show all schemas explicitly in search path: select current_schemas(False);
You need to determine the names and definitions of all of the views in your schema. Create a report that retrieves view information: the view name and definition text.
select viewname, definition from pg_views where schemaname = ‘edbstore’;
Write a statement in psql to reload the config file.
select pg_reload_conf();
Create of report of all the users who are currently connected. The report must display total session time of all connected users.
select usename as user, now()-backend_start as session_time from pg_stat_activity;
You found a user connected to server since long time and decided to gracefully kill its connection. Write a statement to perform this task.
select pg_terminate_backend(pid) from pg_stat_activity where usename = ‘blah’;
Write a query to display name and size of all the databases in your cluster. Size must be displayed using a meaningful unit.
select datname, pg_size_pretty(pg_database_size(oid)) from pg_database order by pg_database_size(oid);
What does the COPY command do?
COPY moves data between tables and file-system files on the database server
What is the SQL command for importing data from a file or program execution into a table, and give basic syntax?
COPY table_name FROM ‘filename’
– or:
COPY table_name FROM PROGRAM ‘command’
What does COPY … FROM do?
Copies data from a file into a table
Can COPY operate on selected columns in a table?
Yes
What is the SQL command for exporting data from a table or a query to a file (give basic syntax)?
COPY table_name TO ‘filename’
– or:
COPY table_name TO PROGRAM ‘command’
What is the detailed syntax of COPY TO?
COPY { table_name [ ( column_name [, …] ) ] | ( query ) }
TO { ‘filename’ | PROGRAM ‘command’ | STDOUT }
[ [ WITH ] ( option [, …] ) ]
where option can be one of:
FORMAT format_name
OIDS [ boolean ]
FREEZE [ boolean ]
DELIMITER ‘delimiter_character’
NULL ‘null_string’
HEADER [ boolean ]
QUOTE ‘quote_character’
ESCAPE ‘escape_character’
FORCE_QUOTE { ( column_name [, …] ) | * }
FORCE_NOT_NULL ( column_name [, …] )
ENCODING ‘encoding_name’
What is the SQL command for exporting the contents of the emp
table to the file /tmp/emp.csv
on the database server, using a CSV format, with a header included?
COPY emp TO ‘/tmp/emp.csv’ WITH (FORMAT CSV, HEADER);
– Don’t forget the parentheses around options!
How would you use the COPY command on a remote host using psql on that remote host to load data from a file on the local host (pretend that psql isn’t on the local host, or you don’t have permission to connect from the local host). Say the file is called emp.csv, the remote host is remote.host, the DB user and database are edbstore/edbstore, and the target table is emp.
cat emp.csv | ssh remote.host “psql -U edbstore edbstore -c ‘copy emp from stdin;’”
What does COPY FREEZE do? Under what circumstances can you use it? What are the caveats? What is the syntax?
COPY tablename FROM filename (FREEZE)
will freeze the loaded rows. It can only be used if the target table was previously created or truncated in the same transaction. This prevents VACUUM from having to do this freezing at some point in the future. The caveat is that the rows will be visible to all other transactions as soon as they are loaded (before the end of the enclosing transaction) – this is a violation of MVCC.
What SQL command would you use (as the superuser on the database server) to dump the contents of the emp table to a csv file, with column headers and a pipe delimiter?
COPY emp TO ‘/tmp/emp.csv’ WITH (FORMAT CSV, HEADER, DELIMITER ‘|’)
What SQL command would you use to create a table copyemp
with the same structure as the emp
table?
CREATE TABLE copyemp (LIKE emp);
Name some reliability features of PostgreSQL
- ACID-compliant
- Supports transactions
- Supports savepoints
- Uses Write-Ahead Logging
Name some features of PostgreSQL that increase scalability
- MVCC (connection scalability)
- Table partitioning (size)
- Tablespaces (size)
What is the maximum database size in PostgreSQL?
unlimited
What is the maximum table size?
32 TB
What is the maximum row size?
1.6 TB
What is the maximum field size?
1 GB
What is the maximum rows per table?
Unlimited
What is the maximum columns per table?
250-1600, depending on column types
What is the maximum indexes per table?
Unlimited
What is the PostgreSQL term for a table or index?
Relation
What is a “relation” in commercial database terminology?
A table or index
What is the PostgreSQL term for a row?
Tuple
What is a “tuple” in commercial database terminology
Row
What is the PostgreSQL term for a column
Attribute
What is an “attribute” in commercial database terminology?
Column
What was the academic precursor to PostgreSQL?
Ingres
At what institution was PostgreSQL first created?
UC Berkeley
What commercial database products trace their lineage to PostgreSQL or Ingres?
SQL Server, Informix, Ingres
What is the postmaster?
The postmaster listens for connections from clients and spawns new a new backend process to handle each connection. The postmaster manages these backend processes as well as other background utility processes