Basic Administration Flashcards

1
Q

Name some security features

A
  • SSL
    • host-based authentication
    • object-level permissions
    • logging
    • groups & roles
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name some recovery and availability features

A
  • Streaming replication (async and sync), aka hot standby
    • Cascading streaming replication
    • pg_basebackup, hot backup
    • PITR (point in time recovery)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name some cool advanced features

A
  • triggers and functions
    • many procedural languages (pgSQL, Perl, TCL, PHP, Java, Python, etc)
    • custom procedural languages
    • upgrade using pg_upgrade
    • unlogged tables
    • materialized views
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does MVCC stand for

A

Multi-Version Concurrency Control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

List some characteristics of MVCC

A
  • Maintains data consistency.
    • Each transaction sees a snapshot of the database as it was at the beginning of the transaction.
    • Writers block writers; nothing else blocks.
    • Concurrent transactions are isolated.
    • Prevents transactions from seeing inconsistent data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does WAL stand for?

A

Write-Ahead Logs (or Write-Ahead Logging)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does WAL work?

A
  • Records each data change before it actually takes place.
    • Data is not considered ‘safe’ until the log is written to disk.
    • Provides recovery in case of crash or failure.
    • As DML is executed, changes are recorded in memory in WAL buffers (as well as the shared data buffers).
    • When COMMIT is executed, the WAL buffers are flushed to the WAL segment files
    • Later the dirty data buffers are written to disk
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is ACID?

A

Atomicity, Consistency, Isolation, Durability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

List some commonly used pg_dump switches

A
  • a Data only
  • s Definitions (“schema”) only
  • n NAME Dump named schema (“namespace”)
  • t NAME Dump named table
  • Fp Plain format
  • Fc Custom format
  • Fd Directory format
  • Ft Tar format
  • f FILE Dump to named file
  • o Dump OIDs
  • j JOBS Parallelize directory format dumps
  • v Verbose
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name four general methods for backup

A
  • SQL dump (pg_dump, pg_dumpall)
    • Filesystem dump
    • Continuous archiving
    • Streaming replication
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

List two methods (command line examples) for dealing with very large databases when doing SQL backups

A
  • pg_dump … | gzip -c > file.sql.gz

* pg_dump … | split -b 1m - filename

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe two methods for restoring SQL dumps done with pg_dump

A
  • psql < TEXT_FORMAT_DUMP_FILE (or psql -f TEXT_FORMAT_DUMP_FILE)
    • pg_restore NON_TEXT_FORMAT_DUMP_FILE
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Using command defaults for pg_dump, when you restore a backup into a new cluster, is there anything you need to do first?

A

Yes, create the target database. By default, pg_dump does not include the command to create the target database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the –create (-C) switch to pg_dump do?

A

pg_dump --create inserts a command to create and reconnect to the target database before the commands to populate that database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the –clean switch to pg_dump do?

A

pg_dump --clean drops the target database prior to recreating it, when using the –create (-C) switch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

List important pg_restore options

A

pg_restore

  • d DB Connect to the specified database, and, if -C (–create) is not specified, restore into this database also.
  • C Create the database specified in the backup and restore into it
  • a Restore data only (assumes the SQL objects like tables have already been created)
  • s Restore object definitions (DDL) only, not data
  • t NAME Restore named table
  • n NAME Restore named schema/namespace
  • v Verbose
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What command can dump an entire cluster as SQL?

A

pg_dumpall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does pg_dumpall do?

A

pg_dumpall dumps all databases in the cluster, and also global objects such as roles and tablespaces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

List commonly used pg_dumpall switches

A

pg_dumpall

  • a Dump data only
  • s Dump definitions only
  • c Clean (drop) objects before recreating
  • g Dump global objects (roles, etc) only, not databases
  • r Dump roles only, not databases or other global objects
  • O Skip ownership commands
  • x Skip privileges
  • -disable-triggers Disable triggers during data-only restore
  • v Verbose
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How many times does pg_dumpall ask for authentication credentials?

A

Once per database (and more?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Why is it a good idea to use a pgpass file when using pg_dumpall?

A

A pgpass file is a good idea for pg_dumpall because the user’s credentials are requested once per database in the cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

List some characteristics of SQL dumps as a backup mechanism

A
  • Generate a text file containing SQL commands (or a binary representation thereof)
  • pg_dump is the relevant command (or pg_dumpall for the entire cluster)
  • pg_dump does not block readers or writers
  • pg_dump does not operate with special permissions
  • pg_dump dumps are internally consistent and are a snapshot of the database at the time pg_dump begins running
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Next to SQL dump, what is the next most simple backup approach?

A

Simple file system level backup is an alternative to SQL dumps, provided that either 1) the database is shut down during the backup, or 2) the native snapshot feature of the filesystem is used (if available).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Why mightn’t filesystem snapshots work without database downtime, even if your filesystem supports this feature?

A

Filesystem snapshots might not work if the database is spread across multiple filesystems; if not, you would have to stop the database to take the multiple snapshots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Describe the process of restoring a backup made via a filesystem snapshot
A filesystem snapshot backup can be restored by 1) copying the backup into place (including the WAL logs) and 2) starting the database, which will go into crash recovery mode and replay WAL logs.
26
How can you take your filesystem snapshot backup so that the restore is as fast as possible?
To minimize restoration time of a filesystem snapshot, force a CHECKPOINT right before the snapshot -- this will minimize the amount of WAL logs that have to be replayed.
27
Can you use tar to back up a database? If so, describe how.
Yes, you can back up a PostgreSQL database by 1) shutting down the database; 2) using tar to copy *all* of the files in the database cluster; and 3) restarting the database.
28
How would you use rsync to backup a database?
You can back up a database using rsync as follows: 1) With the database running, use rsync to copy all the files. 2) Shut down the database. 3) Use rsync again to copy the files; this second rsync will create a consistent image of the database and will be quite fast, minimizing downtime.
29
After SQL dumps and simple filesystem backups, what is a third , more complicated mechanism for backups?
A third backup approach is continuous archiving of WAL logs (combined with a possibly inconsistent filesystem backup, such as produced by tar, even with the database running).
30
What is another name for continuous archiving as used for backup purposes?
Online backup
31
What is the primary purpose of WAL logs?
The primary purpose of WAL logs is to allow database commits to happen quickly (without the data being fully written to the final data pages) but to prevent loss of information in case of a crash -- the WAL logs can be "played" when the database starts up after a crash, thus restoring the physical database to match its logical state at the time of the crash.
32
What is a secondary use to which WAL logs can be put?
Online backup -- first, get WAL log archiving started, in which full and switched WAL logs are copied to backup storage before being recycled; then take a file system backup while the database is running (tar, rsync, etc). If the database crashes, or if you want to revert the database to a specified point in time (PIT), you can copy the original full backup into place along with the archived WAL files, and start the database.
33
How do you enable continuous archiving of WAL files?
* wal_level must be set to `archive` or `hot_standby` (as opposed to, say, `minimal`). * archive_mode must be set to `on` (default is `off`). * archive_command must be defined -- a command to copy the WAL files somewhere.
34
How can you achieve point in time recovery (PITR) with continuous archiving?
PITR can be accomplished by having the database replay the WAL files only up to a specified file, not all the way to the last file.
35
How can you use continuous archiving to achieve warm standby?
Have a second server loaded with the base backup file (filesystem-level backup), and feed the archived WAL files to this second server. At any time, the WAL files can be replayed on this second server so that the second server can take over from the first with a nearly identical state.
36
Why can't pg_dump and pg_dumpall be used with online backup via continuous archiving?
pg_dump and pg_dumpall produce logical, not physical backups of the database. They don't capture enough information for the WAL files to
37
What is the default size of WAL segment files?
16MB -- this can be changed by recompiling PostgreSQL
38
How are WAL segment files named?
Numerically, according to the position in the abstract WAL sequence. I say abstract sequence because there is only a small number of physical files, which are recycled by renaming when a particular WAL file is no longer needed (it has been checkpointed; i.e. the changes it encodes have been reflected to the actual data pages).
39
What does the simplest useful archive_command value on Unix systems?
archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
40
What does %f mean in the archive_command?
%f is the base file name of the WAL file
41
What does %p mean in the archive_command?
%p is the full path name of the WAL file
42
How do you specify a literal percent sign in the archive_command?
%%
43
Describe the return value/exist status of the archive_command
0 if the file could be copied successfully; otherwise, non-zero.
44
What happens if the archive_command starts returning non-zero for every file (perhaps because of network error, unmounted fs, etc)?
The pg_xlog directory will continue to fill up with WAL files. If the containing filesystem fills up, PostgreSQL will do a "panic" shutdown -- no committed transactions will be lost, however.
45
Does continuous archiving back up changes made to postgresql.conf, pg_hba.conf, etc?
No, you must have another approach to back up changes to the configuration files in the data directory.
46
If you exclude the PostgreSQL data directory from the server's normal backup procedures, how can the config files be continuously backed up (since continuous archiving does not do so)?
Start PostgreSQL with `-D config_dir`, where config_dir is a directory outside of the data directory, some place where the normal operating system backup procedures will back the directory up. Then, in postgresql.conf within this directory, use the `data_directory` parameter to point to the actual data directory.
47
How could you temporarily turn off WAL archiving without stopping the server?
Set archive_command to the empty string and reload the server -- WAL files will start to accumulate in the pg_xlog directory, though.
48
What is pg_basebackup?
pg_basebackup is a command for taking a base backup of the live database, to which you would apply archived WAL files in order to recover from a disaster.
49
Describe pg_receivexlog
* streams transaction logs from a running cluster * uses streaming replication protocol * these files can be used for PITR * logs streamed in real time * can be used instead of archive command * example: pg_receivexlog -h localhost -D /usr/local/pgsql/archive
50
Describe how the archive_command `cp -i %p /mnt/server/archivedir/%f
Uses the Unix file copy (cp) command to copy the source WAL file (%p - path) to the desired destination path (/mnt/server/archivedir/%f). The twist is that the interactive (-i) switch is used, presumably to prevent an already existing file of the same name from being overwritten. "But wait", you say, "how can this command be interactive? There's no human involved!" That's where the redirection from /dev/null comes in -- if a file would be overwritten, the prompt is written, and the empty response causes an error (status 1), which is what we would want in this scenario.
51
Describe the "low level API" for base backup that is an alternative to using the pg_basebackup command
* Connect with psql and issue the command "select pg_start_backup('backup label');" * Back up the data directory using tar or rsync, etc. * Now issue the SQL command "select pg_stop_backup()". [Note: no label is required for the stop; it's assumed (or enforced) that there is only one backup happening at a time.]
52
What CLI command can be used to take a base backup of a live cluster?
pg_basebackup
53
Can a backup taken by pg_basebackup be used for PITR?
Yes
54
Can a backup taken by pg_basebackup be used for streaming replication?
Yes
55
What does pg_basebackup do, exactly?
It uses the low-level backup API (pg_start_backup('label') and pg_stop_backup()) wrapped around some binary copying mechanism, i.e. it automatically puts the database in and out of backup mode.
56
List important pg_basebackup switches/options
- D - destination for backup - F

- format: plain or tar - X - include log files created during the backup, so the backup is fully usable - z - compress with gzip - Z - compression level - P - enable progress reporting - h - host on which cluster is running - p - cluster port

57
Describe what this command is doing: `pg_basebackup -h localhost -D /usr/local/pgsql/backup`
pg_basebackup is being used to take a base backup of the cluster running on localhost; the files will be written to /usr/local/pgsql/backup/.
58
Describe the configuration steps required to use pg_basebackup
* Modify pg_hba.conf to add a replication connection, e.g.: "host replication postgres IP_ADDR/32 trust" * archive_command = 'cp -i %p /dest/dir/%f * archive_mode = on # requires restart * max_wal_senders = 3 wal_keep_segments = NUM wal_level = archive
59
Describe the steps to performing (Point-In-TIme) Recovery
* Stop the server (if it is running). * If you have enough space, keep a copy of the data directory and the transaction logs. * Remove all directories and files from the cluster data directory. * Restore the database files from the base backup (file system backup). * Verify the ownership of restored backup directories (must not be root). * Remove any files in pg_xlog/. * If you have any unarchived WAL segment files recovered from the crashed cluster, copy them in pg_xlog. * Create a recovery command file recovery.conf in the cluster data directory. * Start the server. * Upon completion of the recovery process, the server will rename recovery.conf to recovery.done.
60
List the settings (with types) in the recovery.conf file used for point-in-time recovery
``` restore_command (string) # unix: restore_command = 'cp /archive/dir/%f "%p"' # windows: restore_command = 'copy c:\\archive\\\dir\\"%f" "%p"' recovery_target_name (string) recovery_target_time (timestamp) recovery_target_xird (string) recovery_target_inclusive (boolean) recovery_target_timeline (string) pause_at_recovery_target (boolean) ```
61
Why might one choose to set `archive_timeout` to something other than the default of 0?
When doing WAL archiving, remember that only full (16MB) WAL files are shipped, so if the transaction rate and volume are low, you could be exposed to losing data in case of catastrophe. The archive_timeout forces a WAL log to be shipped after the specified number of seconds has passed, regardless of whether it is full or not. If the value is too low (but not zero), it could lead to WAL bloat in the archive.
62
Describe what the pg_xlogdump contrib module and how to use it
* pg_xlogdump displays the WAL in human-readable format * can give wrong results when the server is running * Syntax: pg_xlogdump [[startseg] [endseg]]
63
What is the name of the contrib module that can display the WAL in a human-readable format
pg_xlogdump
64
What are the two methods for upgrading PostgreSQL?
* ) Old-school: pg_dump and then restore into the new cluster * ) New-school: pg_upgrade
65
List important features of pg_upgrade
* Helps upgrade between major releases of PostgreSQL (that would ordinarily require dump and restore) * Supports upgrading PG 8.3.X or later to latest version * Verifies old and new clusters are binary compatible * Option for checking clusters (?) * Can do in-place or side-by-side upgrade * Side-by-side upgrade requires double storage * Can be done with parallel jobs in PG 9.3+
66
List important pg_upgrade options/switches
- b old_bindir - B new_bindir - d old_datadir - D new_datadir - p old_port - P new_port - c check clusters only (no change) - j # of jobs - k use hard links instead of copying files - r retain SQL and log files even after successful completion - u user_name
67
What PG architecture component creates execution plans?
Planner
68
What PG architecture component chooses the most efficient plan for a query?
Optimizer
69
List some execution plan components
``` row estimates (cardinality) access method: sequential or index join method: hash, nested loop, etc join type and order sorting and aggregates ```
70
Describe EXPLAIN usage
``` EXPLAIN (options) statement where option can be one of: ANALYZE VERBOSE COSTS BUFFERS TIMING FORMAT {TEXT|XML|JSON|YAML} ```
71
In the explain output: | seq scan on emp (cost=X..Y rows=R width=W) what are X, Y, R, and W?
X=estimated startup cost Y=estimated total cost R=estimated rows output W=estimated average width in bytes of rows output
72
Plans are made and costed using table statistics - what are two primary table statistics?
Table statistics in pg_class include retuples, the estimated number of rows in a table, and relpages, the estimated number of disk blocks taken up by a table or index.
73
Are table statistics updated in real time?
No, table statistics are updated using ANALYZE or VACUUM ANALYZE
74
What tables are the table statistics stored in?
pg_class and pg_statistic store the table statistics; pg_stats is a view on pg_statistic that is more commonly used.
75
What gets analyzed when you run 'ANALYZE' (with no options)?
The tables and indexes in the current database that the current user owns (is that right?)
76
Can you ANALYZE just a single table or column
Yes - "ANALYZE some_table" or "ANALYZE some_table(some_col)"
77
Does autovacuum run ANALYZE
Yes, autovacuum runs ANALYZE by default.
78
Does a DELETE immediately remove a row?
No, DELETE merely marks a row as deleted; the row space can be reused or removed by VACUUM .
79
What are four useful functions of VACUUM?
* Can recover or reuse space occupied by obsolete rows (from deletes and updates) * Can (via ANALYZE option) update data statistics * Updates the visibility map, which speeds up index-only scans * Protects against loss of very old data due to transaction ID wraparound
80
What is the visibility map useful for?
It makes index-only scans more efficient (and more likely to be chosen by the query planner) - the full heap tuple doesn't need to be read in order to determine whether an index entry is valid or not.
81
What does plan VACUUM do (as opposed to VACUUM full)?
* removes dead rows and marks the space available for future reuse. * Does *not* shrink the file *except* for dead rows at the end of the table
82
What does VACUUM FULL do that plain vacuum does not do?
* More aggressive algorithm (what does this mean?) * Rewrites the entire table with no dead space * Takes a hell of a lot more time * Requires at least twice the disk space of the original copy of the table
83
Describe the problem of transaction ID wraparound failure?
Each transaction has a 32-bit ID (XID) that is allocated serially, and each row version is marked with the XID of the transaction that created it. A transaction is not allowed to see rows with XIDs that are larger than its own, because these are "in the future". But because there is a limit to how large XIDs can be (2^32, approx 4 billion), at some point XIDs wrap around to 0, at which point a row with a new, low XID is seen as older than higher XIDs, when actually it is newer. Vacuuming can "freeze" old rows by assigning them a special XID that means "old, visible by all". This prevents wraparound failure, although I am not sure about the details.
84
How often must a table be vacuumed to avoid transaction ID wraparound failure?
At least once every two billion transactions.
85
What is the visibility map?
The visibility map for a table keeps track of which pages contain only tuples that are visible to all current and future transactions (until the page is modified, anyway). This has two benefits: 1) it helps vacuum avoid looking at pages unnecessarily, and 2) it allows index scans to avoid grabbing the heap tuple merely to determine if the current transaction is allowed to see the tuple for an index entry. The VM is tiny compared to the table/heap, so for very large tables, the cost savings can be significant. This allows "index-only scans" to be used.
86
What is the name of the file that contains the visibility map for a table?
_vm
87
How does the visibility map get updated?
VACUUM updates it.
88
What can you say about the size of the visibility map of a table?
The visibility map is very small, so it is readily cached, and many index entries can be checked for visibility with very little memory or disk I/O.
89
When do you need to run REINDEX?
REINDEX should be run when: * an index is corrupted (rare) * an index is bloated (many almost pages) * a storage parameter for the index (like fillfactor) has been changed * an index build with CONCURRENTLY failed, leaving an invalid index
90
What is the syntax for REINDEX?
REINDEX {INDEX|TABLE|DATABASE|SYSTEM} name [FORCE]
91
Where is meta-information about tables and other objects stored?
The pg_catalog schema is where system information about a database is stored.
92
What sorts of objects are stored in pg_catalog?
The following objects are stored in the pg_catalog schema: * System tables (like pg_class) * System functions (e.g. pg_database_size) * System views (pg_stat_activity)
93
Is pg_catalog always part of the search path?
Yes, pg_catalog is effectively part of the search_path.
94
What psql command lists system information tables and views?
The \dS psql command lists tables and views from the pg_catalog schema (in addition to other tables and schemas in the search path).
95
What are the system catalog tables holding tables, constraints, indexes, triggers, and views?
pg_tables, pg_constraint (no s), pg_indexes, pg_trigger (no s), and pg_views - (thanks for the consistency, guys!) These are all views except for pg_constraint.
96
What is the system function for showing the database to which you are connected?
current_database() is the system function for showing the database to which you are connected
97
What is the system function for showing the first schema in the search path?
current_schema() is the system function for showing the first schema in the search path.
98
What are the system functions for showing the client IP address and port and server address and port for the current connection?
inet_client_addr, inet_client_port, inet_server_addr, inet_server_port
99
What is the system function for telling how long the cluster has been up?
pg_postmaster_start_time
100
What are the columns in the \dS output?
schema, relation name, relation type, and owner
101
What extra columns are shown by \dS+?
size and description (the latter is empty for system objects)
102
What is the difference between the session_user and current_user/user system functions?
session_user() is analogous to the UNIX "real user", current_user() to the "effective user"
103
What system function returns the array of schemas in the search path?
current_schemas(boolean)
104
What does the boolean parameter to the current_schemas() function do?
If true, the schemas implicitly in the search path (usually pg_catalog) are also included. If false, just the schemas from the normal explicit search_path are included.
105
What is the system function for returning the value of a configuration parameter/variable?
current_setting(setting_name_str)
106
What is the system function for modifying a configuration variable?
set_config()
107
What SQL command is the current_setting() function equivalent to?
SHOW
108
What SQL command is the set_config() function equivalent to?
SET
109
What system function cancels the current query of a backend process?
pg_cancel_backend(pid) cancels the current query in a backend process - the argument is pid
110
What system function terminates/kills a backend process?
pg_terminate_backend(pid)
111
What is the usage for the system function set_config()?
set_config(setting, new_value, is_local_to_transaction) (if not transaction-specific, then applies to the session).
112
What is the system function for reloading the PostgreSQL config files?
pg_reload_conf() reloads the configuration files
113
What system function rotates the server's log file?
pg_rotate_logfile
114
What two functions are used to start and stop online backup?
pg_start_backup(label, [fast]) and pg_stop_backup()
115
What is the function for determining the disk space used by a tablespace?
pg_tablespace_size(name_or_oid)
116
What is the function for determining the disk space used by a database?
pg_database_size(name_or_oid)
117
What is the function for determining the disk space used by a relation, not including indexes and toasted data?
pg_relation_size(name_or_oid)
118
What is the function for determining a relation's TOTAL size, including indexes and TOAST data?
pg_total_relation_size(name_or_oid)
119
Is the size reported in the size column in the \d+ output the same as pg_relation_size() or pg_total_relation_size()?
Neither one!
120
What system function reports the number of bytes required to store the given operand?
pg_column_size(something)
121
List some file operation functions
pg_ls_dir, pg_read_file, pg_stat_file
122
What system function can be used to return filenames in a specified directory?
pg_ls_dir(dir_relative_to_data_dir) - superuser only. E.g.: select pg_ls_dir('.') lists all files in the data directory
123
What system function can read a file
pg_read_file(path) reads a file, one line per row
124
What psql command can you use to determine a function's usage?
\df func_name
125
What system view can you use to see details of open connections and running transactions?
pg_stat_activity
126
What system view can you use to see the list of current_locks being held?
pg_locks
127
What system view can you use to see details on databases?
pg_stat_database
128
What system view can you use to see usage statistics for tables?
pg_stat_user_tables
129
What system view can you use to see usage statistics for indexes?
pg_stat_user_indexes
130
What system view can you use to see usage statistics for functions?
pg_stat_user_functions
131
Write a query to find the list of schemas currently in your search path
``` -- Show all schemas explicitly in search path: select current_schemas(False); ```
132
You need to determine the names and definitions of all of the views in your schema. Create a report that retrieves view information: the view name and definition text.
select viewname, definition from pg_views where schemaname = 'edbstore';
133
Write a statement in psql to reload the config file.
select pg_reload_conf();
134
Create of report of all the users who are currently connected. The report must display total session time of all connected users.
select usename as user, now()-backend_start as session_time from pg_stat_activity;
135
You found a user connected to server since long time and decided to gracefully kill its connection. Write a statement to perform this task.
select pg_terminate_backend(pid) from pg_stat_activity where usename = 'blah';
136
Write a query to display name and size of all the databases in your cluster. Size must be displayed using a meaningful unit.
select datname, pg_size_pretty(pg_database_size(oid)) from pg_database order by pg_database_size(oid);
137
What does the COPY command do?
COPY moves data between tables and file-system files on the database server
138
What is the SQL command for importing data from a file or program execution into a table, and give basic syntax?
COPY table_name FROM 'filename' -- or: COPY table_name FROM PROGRAM 'command'
139
What does COPY ... FROM do?
Copies data from a file into a table
140
Can COPY operate on selected columns in a table?
Yes
141
What is the SQL command for exporting data from a table or a query to a file (give basic syntax)?
COPY table_name TO 'filename' -- or: COPY table_name TO PROGRAM 'command'
142
What is the detailed syntax of COPY TO?
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) } TO { 'filename' | PROGRAM 'command' | STDOUT } [ [ WITH ] ( option [, ...] ) ] where option can be one of: FORMAT format_name OIDS [ boolean ] FREEZE [ boolean ] DELIMITER 'delimiter_character' NULL 'null_string' HEADER [ boolean ] QUOTE 'quote_character' ESCAPE 'escape_character' FORCE_QUOTE { ( column_name [, ...] ) | * } FORCE_NOT_NULL ( column_name [, ...] ) ENCODING 'encoding_name'
143
What is the SQL command for exporting the contents of the `emp` table to the file `/tmp/emp.csv` on the database server, using a CSV format, with a header included?
COPY emp TO '/tmp/emp.csv' WITH (FORMAT CSV, HEADER); | -- Don't forget the parentheses around options!
144
How would you use the COPY command on a remote host using psql on that remote host to load data from a file on the local host (pretend that psql isn't on the local host, or you don't have permission to connect from the local host). Say the file is called emp.csv, the remote host is remote.host, the DB user and database are edbstore/edbstore, and the target table is emp.
cat emp.csv | ssh remote.host "psql -U edbstore edbstore -c 'copy emp from stdin;'"
145
What does COPY FREEZE do? Under what circumstances can you use it? What are the caveats? What is the syntax?
`COPY tablename FROM filename (FREEZE)` will freeze the loaded rows. It can only be used if the target table was previously created or truncated in the same transaction. This prevents VACUUM from having to do this freezing at some point in the future. The caveat is that the rows will be visible to all other transactions as soon as they are loaded (before the end of the enclosing transaction) -- this is a violation of MVCC.
146
What SQL command would you use (as the superuser on the database server) to dump the contents of the emp table to a csv file, with column headers and a pipe delimiter?
COPY emp TO '/tmp/emp.csv' WITH (FORMAT CSV, HEADER, DELIMITER '|')
147
What SQL command would you use to create a table `copyemp` with the same structure as the `emp` table?
CREATE TABLE copyemp (LIKE emp);
148
Name some reliability features of PostgreSQL
* ACID-compliant * Supports transactions * Supports savepoints * Uses Write-Ahead Logging
149
Name some features of PostgreSQL that increase scalability
* MVCC (connection scalability) * Table partitioning (size) * Tablespaces (size)
150
What is the maximum database size in PostgreSQL?
unlimited
151
What is the maximum table size?
32 TB
152
What is the maximum row size?
1.6 TB
153
What is the maximum field size?
1 GB
154
What is the maximum rows per table?
Unlimited
155
What is the maximum columns per table?
250-1600, depending on column types
156
What is the maximum indexes per table?
Unlimited
157
What is the PostgreSQL term for a table or index?
Relation
158
What is a "relation" in commercial database terminology?
A table or index
159
What is the PostgreSQL term for a row?
Tuple
160
What is a "tuple" in commercial database terminology
Row
161
What is the PostgreSQL term for a column
Attribute
162
What is an "attribute" in commercial database terminology?
Column
163
What was the academic precursor to PostgreSQL?
Ingres
164
At what institution was PostgreSQL first created?
UC Berkeley
165
What commercial database products trace their lineage to PostgreSQL or Ingres?
SQL Server, Informix, Ingres
166
What is the postmaster?
The postmaster listens for connections from clients and spawns new a new backend process to handle each connection. The postmaster manages these backend processes as well as other background utility processes
167
Does PostgreSQL use threads?
No, it uses processes
168
What are three major components of shared memory in PostgreSQL?
* Shared buffers (data buffers) * WAL buffers * Process array
169
What are seven background utility processes managed by the postmaster?
* bgwriter * stats collector * checkpointer * archiver * autovacuum * log writer * WAL writer
170
What are four major categories of disk files used by PostgreSQL?
* Data files and friends (indexes, visibility map) * WAL segments * Archived WAL * Error/diagnostic log files
171
What does the background writer do?
Writes dirty data blocks to disk when room is needed for more blocks in shared memory.
172
What background process writes dirty data blocks to disk?
background writer (bgwriter)
173
What does the WAL writer process do?
Flushes write-ahead log to disk
174
What flushes write-ahead log data to disk?
The WAL writer process
175
What background process automatically performs a checkpoint (ensures that dirty data blocks are written) at certain intervals or other configured conditions
The checkpointer process
176
What does the checkpointer process do?
It performs checkpoints (syncing of dirty data blocks to disk) at intervals or otherwise according to configuration parameters
177
Is there one autovacuum process?
There is one autovacuum launcher process, which launches multiple autovacuum workers processes
178
What starts autovacuum worker processes?
The autovacuum launcher process
179
What does the autovacuum launcher process do?
Launches autovacuum worker processes
180
What do autovacuum worker processes do?
Recover free space for reuse
181
What processes recover free space for reuse?
Autovacuum worker processes
182
What background process routes log messages to syslog, eventlog, or log files
Logging collector
183
What does the logging collector process do?
Routes log messages to syslog, eventlog or log files
184
What background process collects usage statistics by relation and block?
Stats collector
185
What does the stats collector background process do?
Collects usage statistics by relation and block
186
What archives write-ahead log files
Archiver
187
What does the archiver process do?
Archives write-ahead log files in pg_xlog when full (e.g. copies them to a mounted SAN share).
188
What are the mechanisms for communication between the postmaster and its subordinate processes?
Shared memory and semaphores
189
On what bases can a connection be authenticated?
IP address, user, password, key
190
What does authorization consist of?
Verifying permissions in the database
191
What is the PostgreSQL buffer cache called?
Shared buffers
192
What is the purpose of the shared buffers?
To read OS and disk reads
193
When are shared buffer blocks written to disk?
Shared buffer blocks are written to disk only when needed: 1) to make room for new block 2) at checkpoint time
194
Describe the life history of write-ahead logging data
When DML is executed to change data, the changes are made to the data blocks in shared memory and also (in a different form) to WAL buffers in shared memory. The WAL writer process flushes WAL buffers to WAL segment files on disk (the "transaction log") periodically, or on commit, or when the WAL buffers are full. As of 9.2, there is a group commit feature which attempts to batch together WAL-writing from multiple commits that occur nearly at the same time.
195
Describe the state of data changes before commit, after commit, and after checkpoint
* Before commit: changes are stored in memory in the shared data buffers and also in the WAL buffers. (It is possible under conditions of high activity and/or tight memory that changes may be forced out to WAL files and data files). * After commit: changes have been written to write-ahead log files on disk (but not necessarily to the data files). * After checkpoint: changes have been written from the shared buffers to data files.
196
What are the three main stages of statement processing?
Parsing, optimizing, and execution
197
What are the components of statement parsing?
1. Syntax check 2. Call Traffic Cop (what *is* that?) 3. Identify query type 4. Command processor if needed 5. Break query in tokens
198
What are the components of statement optimization?
1. Planner generates plans using database statistics 2. Query cost calculation 3. Choose best plan
199
What are the components of statement execution (after parsing and optimization are done)?
Haha, fooled you. Just one step: execution.
200
What is a database cluster?
A cluster is a collection of one or more databases managed by one server instance
201
Complete the sentence: if there are multiple clusters on a single computer, each one will have a different ______, ______, and ________.
Each cluster has a separate: * data directory * TCP port * set of processes
202
What are the items that you will find in the /data directory by default?
PG_VERSION A file containing the major version number of PostgreSQL base - Subdirectory containing per-database subdirectories global - Subdirectory containing cluster-wide tables, such as pg_database pg_clog - Subdirectory containing transaction commit status data pg_multixact - Subdirectory containing multitransaction status data (used for shared row locks) pg_notify - Subdirectory containing LISTEN/NOTIFY status data pg_serial - Subdirectory containing information about committed serializable transactions pg_snapshots - Subdirectory containing exported snapshots pg_stat_tmp - Subdirectory containing temporary files for the statistics subsystem pg_subtrans - Subdirectory containing subtransaction status data pg_tblspc - Subdirectory containing symbolic links to tablespaces pg_twophase - Subdirectory containing state files for prepared transactions pg_xlog - Subdirectory containing WAL (Write Ahead Log) files postmaster. opts - A file recording the command-line options the server was last started with postmaster. pid A lock file recording the current postmaster process ID (PID), cluster data directory path, postmaster start timestamp, port number, Unix-domain socket directory path (empty on Windows), first valid listen_address (IP address or *, or empty if not listening on TCP), and shared memory segment ID (this file is not present after server shutdown) Usually: postgresql.conf Usually: pg_hba.conf Usually: pg_ident.conf
203
What does the data/global subdirectory contain?
Cluster-wide system tables, like pg_database
204
What does the data/base subdirectory contain?
Per-database subdirectories for databases having data in the default tablespace
205
What does the data/pg_xlog subdirectory contain?
Write Ahead Log files
206
What does the data/pg_clog subdirectory contain?
Transaction commit status data
207
What does the data/pg_subtrans subdirectory contain?
Subtransaction commit status data
208
What does the pg_multixact subdirectory contain
Multitransaction status data (used for shared row locks)
209
What does the pg_notify subdirectory contain?
LISTEN/NOTIFY status data
210
What does the data/pg_serial subdirectory contain?
Information about committed serializable transactions
211
What does the data/pg_snapshots subdirectory contain?
Exported snapshots
212
What does the data/pg_stat_tmp subdirectory contain?
Temporary files for the statistics subsystem
213
What does the data/tblspc subdirectory contain?
Symbolic links to tablespaces
214
What is data/postmaster.opts?
A file recording the command-line options the server was last started with
215
postmaster.pid
A lock file recording the current postmaster process ID (PID), cluster data directory path, postmaster start timestamp, port number, Unix-domain socket directory path (empty on Windows), first valid listen_address (IP address or *, or empty if not listening on TCP), and shared memory segment ID (this file is not present after server shutdown)
216
What does the pg_twophase subdirectory contain?
State files for prepared transactions
217
Describe the contents of the data/base directory
One directory per database in the cluster, the directory being named with the OID of the database. This is the default location for the cluster's databases and files, and the system catalogs are stored here at a minimum
218
Describe the contents of the data/base/NNN directory, where NNN is the OID of some database
One or more files for each table or index in the database. For ordinary relations, these files are named after the table or index's filenode number, which can be found in pg_class.relfilenode. But for temporary relations, the file name is of the form tBBB_FFF, where BBB is the backend ID of the backend which created the file, and FFF is the filenode number. In either case, in addition to the main file (a/k/a main fork), each table and index has a free space map, which stores information about free space available in the relation. The free space map is stored in a file named with the filenode number plus the suffix _fsm. Tables also have a visibility map, stored in a fork with the suffix _vm, to track which pages are known to have no dead tuples. Unlogged tables and indexes have a third fork, known as the initialization fork, which is stored in a fork with the suffix _init.
219
If a database has multiple tablespaces, where do those tablespaces go?
Each tablespace is a directory. `base` is the default tablespace. All other tablespaces can be located anywhere, but a soft link to each tablespace must be placed in data/tblspc/.
220
Can a database have files in multiple tablespaces, and if so, what is the physical manifestation of this in the filesystem?
Yes, a database can have files in multiple tablespaces. Each tablespace has a subdirectory for each database that has files in the tablespace, so the OID of the tablespace may appear in multiple tablespace directories.
221
What is the maximum size for a physical file representing a table or index?
1GB
222
Given the maximum size for a physical table/index file, how is it that the maximum table size is 32 TB? Describe how files for tables/indexes are named
* A table or index is stored in one or more physical files. * For non-temporary relations, the first such file is named as the relation's file node number (pg_class.relfilenode), and subsequent files for the same relation are named _N where N is a serial number. * For temporary relations, the file name is of the form tBBB_FFF, where BBB is the backend ID of the backend which created the file, and FFF is the filenode number
223
In addition to the data files for a table, what other files for the table may there be, and how are they named?
* Visibility map: RELFILENODE_vm * Free space map: RELFILENODE_fsm * (For unlogged relations): initialization fork: RELFILENODE_init
224
List the contents of the PostgreSQL installation directory
``` bin - programs data - data directory doc - documentation include - header files installer, scripts - installer files (EDB) lib - libraries pgAdmin III (EDB) StackBuilder (EDB) pg_env.{bat,sh} ```
225
What are the five main components of a relation data page?
* Page header - approx 24 bytes long; pointer(s) to free space in the page; general info * Row/index pointers - array of offset/length pairs pointing to row or index entries later in the page * Free space - unallocated space. New pointers are allocated from the front , new rows/index entries from the rear * Row/Index entries - actual row or index entries * Special - index access method-specific data (empty in regular tables) [Really? Empty?]
226
Does Windows have locked user accounts?
No; a password is required
227
What are the three installation options?
* EDB One-Click Installer * OS system package (RPM/YUM, Debian/Ubuntu DEB, FreeBSD port, Solaris package, Mac OS X Homebrew * Source code
228
What are five primary environment variables that affect the running of PostgreSQL commands, including starting the server. (Hint: the first one is not PostgreSQL-specific).
PATH - should include the correct PG bin directory PGDATA - points to data cluster directory PGPORT - point to port on which cluster is running PGUSER - default database user name PGDATABASE - default database
229
How do you set environment variables in UNIX systems?
Edit your shell .profile or .bash_profile
230
How do you set environment variables in Windows
Use the Windows My Computer properties page
231
What command is used to create a PostgreSQL cluster?
initdb
232
What is initdb?
initdb creates a database cluster's data directory
233
What is the syntax for initdb
initdb –D - a - specifies the authentication method for local users - D - Database cluster directory - U - Select the database super user name - E - Specify the database encoding - k --data-checksums - Use checksums on data pages to help detect corruption - W – prompt for superuser password - X, --xlogdir=XLOGDIR location for the transaction log directory
234
What files should be edited after creating a new database cluster?
postgtesql.conf - to set the correct listening address and port, and to set appropriate configuration in general pg_hba.conf - to define what users should be able to connect to which databases from which IP addresses
235
What PG command is usually used to start and stop the cluster?
pg_ctl
236
Describe the pg_ctl subcommands
−pg_ctl initdb [] - creates a new PostgreSQL database cluster −pg_ctl start [] - Start the server −pg_ctl stop [] - Stop the server −pg_ctl restart [] - Restart the server −pg_ctl status [] - Display server status −pg_ctl reload [] - Reload configuration file −pg_ctl promote [-D DATADIR] – Promote Standby to be Primary −pg_ctl kill signal_name process_id – send a signal(ABRT HUP INT QUIT TERM USR1 USR2) to a process −Pg_ctl register|unregister - a system service on Microsoft Windows
237
What pg_ctl switch/option is particularly useful when stopping or restarting the server? Describe its usage
−-m smart (the defaults) waits for all clients to exit −-m fast rolls back active transactions, closes open connections, and shuts down cleanly −-m immediate performs an immediate, abnormal shutdown (i.e. a crash)
238
Besides, -m, what are other useful pg_ctl options/switches?
−-D to specify an alternate cluster location −-l to specify an alternate log file, when starting the server −-c, --core-files allow postgres to produce core files Starting and Stopping the Server (pg_ctl)
239
Inside a psql session, how can you change the database connection?
\c [DBNAME [USERNAME]]
240
What are six common connection settings?
* listen_addresses (default localhost) - IP addresses to listen on; '*' means all * port (default 5432) - port to listen on * max_connections (default 100) - max concurrent connections * superuser_reserved_connections (default 3) - number of connections reserved for superusers (out of the defined max_connections) * unix_socket_directory (default /tmp) - directory to be used for UNIX socket connections * unix_socket_permissions (default 0777) - access permissions of the UNIX-domain socket
241
List seven security and authentication settings (hint: six have to do with ssl)
* authentication_timeout (default 1 minute) * ssl (default: off) - enable SSL connections * ssl_ca_file - SSL certificate authority file * ssl_cert_file - SSL certification * ssl_key_file - SSL private key * ssl_ciphers - list of eligible SSL ciphers * ssl_renegotiation_limit (default 512 MB) - how much data can flow through the connection before renegotiation occurs
242
List five memory settings
* shared_buffers (default: <=128MB) - size of shared buffer pool; rule of thumb: 25% of system memory to a max of 8GB on Linux, or 512 MB on Windows * temp_buffers (default: 8 MB) - amount of memory used by each backend for caching temp table data * work_mem (default: 1MB) - amount of memory used for each sort or hash operation before switching to temporary disk files * maintenance_work_mem (default: 16 MB) - amount used for each index build or VACUUM * temp_file_limit (default: -1) - amount of disk space that a session can use for temporary files. Default is unlimited. Attempting to exceed the limit will abort a transaction.
243
What is a rule of thumb for setting shared_buffers on Windows?
25% of system memory up to a maximum of 512 MB.
244
What is a rule of thumb for setting shared_buffers on UNIX systems?
25% of system memory, up to a maximum of 8 GB.
245
What is shared_buffers?
Memory setting controlling the size of the shared data buffer cache. Default is <= 128 MB.
246
What is temp_buffers?
temp_buffers (default: 8MB): Amount of memory used by each backend for caching temporary table data.
247
What is work_mem?
work_mem (default: 1MB): Amount of memory used for each sort or hash operation before switching to temporary disk files. Default is conservative, but don't overdo it.
248
What is maintenance_work_mem?
maintenance_work_mem (default: 16MB): Amount of memory used for each index build or VACUUM.
249
What is temp_file_limit?
temp_file_limit (default -1): amount of disk space that a session can use for temporary files. A transaction attempting to exceed this limit will be cancelled. Default is unlimited. Memory Settings
250
What are three commonly used query planner settings? In addition to these, comment on the enable_* settings that also affect query planning.
* random_page_cost (default 4.0): Estimated cost of a random page fetch, in abstract cost units. May need to be reduced to account for caching effects. * seq_page_cost (default 1.0): Estimated cost of a sequential page fetch, in abstract cost units. May need to be reduced to account for caching effects. Must always set random_page_cost >= seq_page_cost. * effective_cache_size (default 128M): Used to estimate the cost of an index scan. Rule of thumb is 75% of system memory. * There are plenty of enable_* parameters which influence the planner in choosing an optimal plan. For example: * enable_indexonlyscan enables or disables the query planner's use of index-only-scan plan types
251
What is random_page_cost?
random_page_cost (default 4.0): Estimated cost of a random page fetch, in abstract cost units. May need to be reduced to account for caching effects.
252
What is seq_page_cost?
seq_page_cost (default 1.0): Estimated cost of a sequential page fetch, in abstract cost units. May need to be reduced to account for caching effects. Must always set random_page_cost >= seq_page_cost.
253
What is effective_cache_size?
effective_cache_size (default 128M): Used to estimate the cost of an index scan. Rule of thumb is 75% of system memory.
254
List five Write-Ahead Log settings
* wal_level * fsync * wal_buffers * checkpoint_segments * checkpoint_timeout
255
What is wal_level?
wal_level (default: minimal). Determines how much information is written to the WAL. Change this to enable WAL archiving/replication. Other values are archive and hot_standby.
256
What is fsync?
fsync (default on): Turn this off to make your database much faster – and silently cause arbitrary corruption in case of a system crash.
257
What is wal_buffers?
wal_buffers (default: -1, autotune): The amount of memory used in shared memory for WAL data. The default setting of -1 selects a size equal to 1/32nd (about 3%) of shared_buffers
258
What is checkpoint_segments?
checkpoint_segments (default 3): Maximum number of 16MB WAL file segments between checkpoints. Default is too small!
259
What is checkpoint_timeout?
checkpoint_timeout (default 5 minutes): Maximum time between checkpoints.
260
What are seven settings controlling where to log error/diagnostic messages?
* log_destination * logging_collector * log_directory * log_filename * log_file_mode * log_rotation_age * log_rotation_size
261
What are the possible value of log_destination?
log_destination. Valid values are combinations of stderr, csvlog, syslog, and eventlog, depending on platform.
262
What is the format convention for log_filename and what is an example?
strftime (but system strftime is not used so you can't use local extensions). postgresql-%Y-%M-%d.log
263
List four settings controlling when to log
* client_min_messages (default NOTICE). Messages of this severity level or above are sent to the client. * log_min_messages (default WARNING). Messages of this severity level or above are sent to the server. * log_min_error_statement (default ERROR). When a message of this severity or higher is written to the server log, the statement that caused it is logged along with it. * log_min_duration_statement (default -1, disabled): When a statement runs for at least this long, it is written to the server log, with its duration.
264
List eight settings controlling what to log
* log_connections (default off): Log successful connections to the server log. * log_disconnections (default off): Log some information each time a session disconnects, including the duration of the session. * log_error_verbosity (default “default”): Can also select “terse” or “verbose”. * log_duration (default off): Log duration of each statement. * log_line_prefix: Additional details to log with each line. * log_statement (default none): Legal values are none, ddl, mod (DDL and all other data-modifying statements), or all. * log_temp_files (default -1): Log temporary files of this size or larger, in kilobytes. * log_checkpoints (default off): Causes checkpoints and restartpoints to be logged in the server log.
265
What does log_temp_files do?
Logs temporary files of this size or larger, in kilobytes
266
Describe three background writer settings
* bgwriter_delay * bgwriter_lru_maxpages * bgwriter_lru_multiplier
267
Describe the bgwriter_delay setting.
bgwriter_delay (default 200 ms): Specifies time between activity rounds for the background writer.
268
Describe the bgwriter_lru_maxpages setting
bgwriter_lru_maxpages (default 100): Maximum number of pages that the background writer may clean per activity round.
269
Describe the bgwriter_lru_multiplier setting
bgwriter_lru_multiplier (default 2.0): Multiplier on buffers scanned per round. By default, if system thinks 10 pages will be needed, it cleans 10 * bgwriter_lru_multiplier of 2.0 = 20.
270
What is the primary background writer tuning technique
The primary background writer tuning technique is to lower the bgwriter_delay.
271
Describe the search_path setting
search_path specifies the order in which schemas are searched. The default value for this parameter is: "$user", public
272
What setting specifies the order in which schemas are searched?
search_path
273
What is the default_tablespace setting?
default_tablespace is the name of the tablespace in which objects are created by default
274
What is the temp_tablespaces setting?
temp_tablespaces holds the tablespace name(s) in which objects are created by default. (The temp table load is evenly spread across the tablespaces in this list).
275
What is the statement_timeout setting?
Any statement that takes more than the specified number of milliseconds will be aborted. The default value is 0 (no maximum statement time).
276
What are some vacuum cost settings?
``` vacuum_cost_delay vacuum_cost_page_hit vacuum_cost_page_miss vacuum_cost_page_dirty vacuum_cost_limit ```
277
What is the vacuum_cost_delay setting?
vacuum_cost_delay is the length of time, in milliseconds, that the process will wait when the cost limit it exceeded. Default for manual VACUUM is 0, so if you want a low-impact manual vacuum, you should set this to a non-zero value (autovacuum uses 20 ms ....)
278
What is the autovacuum setting?
autovacuum (default on) turns on or off autovacuuming
279
What is the log_autovacuum_min_duration setting, and what is the default?
Autovacuum tasks running longer than this duration (in milliseconds) are logged. -1 is the default, which disables this logging.
280
What is the autovacuum_max_workers setting?
autovacuum_max_workers (default 3) is the max number of autovacuum worker processes that may be running at one time
281
What is the setting to define the maximum number of autovacuum worker processes that may be running at once?
autovacuum_max_workers
282
What is the name of the setting to disable or enable autovacuum?
autovacuum!
283
What is the syntax for including a file in a configuration file?
include 'filename'
284
What is the syntax for including a directory of configuration files in a configuration file? Describe how this works.
include_dir 'somedir'. All files named *.conf will be included from the named directory, in C locale filename order. Thus, you could name the files 00_foo.conf, 01_bar.conf, 02_baz.conf to control the order of loading while still having the names be meaningful.
285
What is the setting to reserve connection slots for the DBA?
superuser_reserved_connections
286
What setting allows you to log statements taking longer than the specified number of milliseconds?
log_min_duration_statement
287
How would you configure the server to accept up to 200 connected users?
In postgresql.conf, set: max_connections = 200 | and restart the server
288
How would you configure the server to reserve 10 connection slots for the DBA?
In postgresql.conf, set: superuser_reserved_connections = 10 | and restart the server
289
How would you set the maximum time to complete client authentication to 10 seconds?
In postgresql.conf set: authentication_timeout = 10s | and reload the server
290
How would you configure the server to save all the error messages in a file inside the pg_log folder in your cluster data directory?
Set logging_collector to on
291
How would you log all queries and their time which are taking more than 5 seconds to execute?
Set log_min_duration_statement to 5000
292
How would you log the users who are connecting to the database cluster?
Set log_connections to on and set the log_line_prefix to something include '%u'
293
``` Change following autovacuum parameter on the production server: −Autovacuum workers to 6 −Auto Vacuum threshold to 100 −Auto Vacuum scale factor to 0.3 −Auto Analyze threshold to 100 −Auto vacuum cost limit to 100 ```
``` set autovacuum_max_workers to 6 set autovacuum_vacuum_threshold to 100 set autovacuum_vacuum_scale_factor to 0.3 set autovacuum_analyze_threshold to 100 set autoavacuum_cost_limit to 100 ```
294
In the PG object hierarchy, what are the three objects under "Database Cluster"
A database cluster contains: * Roles (users, groups) * Tablespaces * Databases
295
What are the top-level objects within a database?
A database contains: * catalogs * extensions * schemas
296
What are the objects within a schema?
* tables * views * sequences * functions * event triggers
297
How do you determine the list of databases in a cluster using SQL?
select datname from pg_database;
298
How do you get the list of databases in a cluster using a PSQL meta command?
\l (lower-case L)
299
What is the syntax of the SQL command for creating databases?
CREATE DATABASE name [ [ WITH ] [ OWNER [=] dbowner ] [ TEMPLATE [=] template ] [ ENCODING [=] encoding ] [ TABLESPACE [=] tablespace ] [ CONNECTION LIMIT [=] connlimit ] ]
300
How do you create a schema named `fooz` owned by the user `foozer`?
create schema fooz authorization foozer;
301
What is the default search_path?
"$user", public
302
What is the "current schema"
The current schema is the first schema in the search_path
303
How does PG use the search_path
When trying to resolve object names (tables, etc), PG searches the schemas in the search_path in order. When creating a table, by default the table is created in the "current schema" -- the first schema in the search_path.
304
What is the only real difference between a user and a group?
Users are roles that can log into any database; groups are roles that can NOT log into any database.
305
Do the following things in psql: 1) create the user 'fred' with password 'flinstone'; 2) create a database 'fred' owned by 'fred'; 3) connect to the 'fred' database as 'fred' and create the 'fred' schema; 4) log out.
``` > create user fred with password 'flinstone'; > create database fred with owner fred; > \c fred fred > create schema fred; > \dn > \q ```
306
How do you retrieve a list of databases using SQL?
select datname from pg_database; -- Advanced: list databases with owners: select d.datname, u.usename from pg_database d join pg_user u on (d.datdba = u.usesysid);
307
How do you list databases using a psql metacommand?
\l (lower case L)
308
How would you retrieve a list of tables in a specified database, including schema and owner?
Two methods: 1. (psql meta-command) \d+ 2. (SQL) select schemaname, tablename, tableowner from pg_tables
309
Describe the use of double-quotes in psql
Double-quotes are used to specify an exact name, preserving case
310
Describe psql's strange and redundant way of specifying database and user on the command line
There are two different and equivalent usages of psql: 1) psql [DBNAME [USER]] and 2) psql -U USER DBNAME
311
When invoking psql from the command line, what are the default values for USER and DBNAME?
If the PGUSER and PGDATABASE environment variables are not defined, the default values for USER and DBNAME are the name of the operating system user. If PGUSER and PGDATABASE are defined, those values are used. If just PGUSER is defined, that value is also the default database name.
312
What psql meta-command is used to re-connect to a different database, and/or as a different user?
\c DBNAME [USER]
313
If, from within psql, you want to reconnect to the same database as a different user, what is the short-hand syntax for that?
\c - NEWUSER | -- (note the dash character)
314
What psql meta-command is used to re-connect to a different database, and/or as a different user?
\c DBNAME [USER]
315
If, from within psql, you want to reconnect to the same database as a different user, what is the short-hand syntax for that?
\c - NEWUSER | -- (note the dash character)
316
If the -h and -p switches are not used when invoking psql, how are the host and port determined?
1) If the PGHOST and PGPORT environment variables are defined, they are used. Otherwise, on UNIX systems, a local UNIX socket connection is attempted, or, on Windows, a local TCP connection.
317
How can you have psql meta-commands executed whenever you connect to a database with psql?
psql always runs commands in ~/.psqlrc unless -X is specified
318
How do you show command history in psql?
\s
319
What does the \s command do in psql?
Shows or saves the command history
320
How do you edit the query buffer in psql?
\e
321
What does \e do in psql?
Edits (and then executes) the query buffer
322
How do you save the query buffer in psql?
\w FILENAME
323
What does \w do in psql?
Saves the query buffer (to the specified FILENAME)
324
How do you set variables in psql?
Two methods. 1) (Command line): -v NAME=VALUE 2) (Inside psql): \set NAME VALUE
325
How do you use a variable in psql?
``` Three ways: 1) Unadorned (numbers) :NAME \set NAME 10 select :NAME; 2) Quoted strings :'NAME' \set NAME testing select :'NAME'; 3) Identifiers :"NAME" \set NAME empno select :"NAME" from emp; ```
326
How do you send query output from psql to a file (or pipe)
``` -o FILENAME or \o FILENAME (FILENAME may be a pipe) Example: \o | grep 4 select empno from emp; 7499 7654 7844 7934 (14 rows) ```
327
How do you use a variable in psql?
``` Three ways: 1) Unadorned (numbers) :NAME \set NAME 10 select :NAME; 2) Quoted strings :'NAME' \set NAME 'testing' select :'NAME'; 3) Identifiers :"NAME" \set NAME 'empno' select :"NAME" from emp; ```
328
How do you send query output from psql to a file (or pipe)
-o FILENAME or \o FILENAME (FILENAME may be a pipe)
329
How do you execute the query buffer in psql?
\g [FILENAME] (filename may be a pipe)
330
How do you output only tuples in psql, and what does that mean, actually?
"Tuples only" mode means that column headings are not output, and the final count of rows is not output. Within psql, \t toggle tuples-only output. On the psql command line, -t turns on tuples-only mode. An alternative to \t is \pset tuples_only
331
What does \x do in psql?
Toggles expanded output, where columns are output as rows.
332
How do you output only tuples in psql, and what does that mean, actually?
"Tuples only" mode means that column headings are not output, and the final count of rows is not output. Within psql, \x
333
What does \echo do in psql, and does it have any options?
\echo [-n] [string] Prints a string on STDOUT, followed by a newline, unless the -n switch is used, which suppresses the newline. The output of \echo is not affected by -o or \o.
334
What does \qecho in psql do?
\qecho is like \echo, but its output *is* redirected by -o or \o.
335
What psql command lists databases?
\l
336
What are the attributes listed by \l for each database?
name, owner, encoding, collation, ctype, access privileges
337
What additional attributes are listed by \l+ beyond those produced by \l?
size, tablespace, and descripion
338
What does \dn do?
Lists schemas (namespaces)
339
What attributes are included in \dn output?
name and owner
340
What attributes are included in \dn+ output?
name, owner, access privileges, and description
341
What does \df do?
List functions
342
What attributes are included in \df output?
schema, name, result data type, argument data types, type
343
What attributes are included in \df+ output?
Same as \df, but in addition: security, volatility, owner, language, source code
344
\d(i,s,t,v,S)[+] [pattern]
List info about indexes, sequences, tables, views, or System objects; any combination of letters is possible
345
What does \drds do in psql?
Lists per-database role settings
346
What does \dp do in psql?
List access privileges (for tables, views, and sequences, by default)
347
What does \ef do in psql?
This command fetches and edits the definition of the named function (Can also be used to create new functions - just don't supply a name, and your editor will be invoked on a function definition template).
348
What is an easy way to change the user's password in psql?
\password
349
In psql, how do you describe a table including descriptions of columns?
\d+ TABLENAME
350
What does \d+ give you that \d doesn't?
\d+ also gives you storage, stats target, and description
351
How do you list schemas in psql?
\dn
352
How do you list tablespaces in psql?
\db
353
How do you list tables in psql?
\dt
354
How do execute a SQL statement in psql, saving the output to a file.
``` Two ways: 1. With \g ("one-shot \o"): select * from emp \g FILENAME 2. With \o: \o FILENAME select * from emp; \o 3. From command line: psql -U edbstore -o /tmp/emp.dat -c "select * from emp" edbstore ```
355
In psql, execute a statement, saving just the data to a file, not the column headers.
``` \t \o FILENAME select * from emp; \o \t ```
356
How do you execute a script with psql?
Three ways: 1. \i FILENAME 2. Command line: psql ... < FILENAME 3. Command line: psql ... -f FILENAME ...
357
In psql, how do you list tables, views and sequences with their associated access privileges?
\dp
358
In psql, which meta command displays the SQL text for a function?
Two approaches, sort of: 1. \df+ myfunc (primitive) 2. \ef myfunc (opens your EDITOR to allow you to view and edit source code of the function)
359
In psql, how do you view the current working directory?
\! pwd
360
When trying to connect to a server, what does the message "Could not connect to Server: Connection refused" mean?
It means either that no PostgreSQL server is running on the specified host, or that it is not listening on the specified host and port.
361
What are the three bogus levels of security according to EDB?
* Server/application * Database * Object
362
According to EDB, what step(s) does the Server level of security consist of?
Checking pg_hba.conf: there must be an entry matching type (e.g. host, hostssl), database name, user name, client IP address, and authentication method (md5, trust, etc).
363
According to EDB, what does the "Database" level of security consist of?
* Checking user/password combo * CONNECT privilege on the database * SCHEMA permissions
364
According to EDB, what does the "Object" level of security consist of?
* Object (e.g. table) level privileges, administered with GRANT and REVOKE
365
What are the data fields in each line of pg_hba.conf?
* Type (e.g. host, hostssl) * Database name (or 'all') * User name (or 'all') * Host spec, incl IPv4, IPv6, or DNS hostname * Client IP address modified by CIDR mask * authentication method (trust, reject, md5, password, gss, sspi, krb5, ident, peer, pam, ldap, radius or cert.)
366
Name as many of the 13 authentication methods as you can
trust, reject, md5, password, gss, sspi, krb5, ident, peer, pam, ldap, radius or cert.
367
In pg_hba.conf, what is a 'special' database name value that must be used for, e.g. pg_basebackup connections
replication
368
How do you give access to a user someuser connecting to somedb from a specific IP address a.b.c.d, using md5. Be as restrictive as possible. Don't forget any of the steps required to allow the client to connect.
* Add a line to pg_hba.conf (usually in the cluster's data directory, unless you have specified that it go elsewhere), like: host somedb someuser a.b.c.d/32 md5 * Reload the server to activate the change to pg_hba.conf - send a HUP signal to the postmaster, or execute the SQL "select pg_reload_conf();" as a superuser, or use "pg_ctl -D datadir reload" (or the OS service equivalent that wraps pg_ctl).
369
How would you set up default privileges so that whenever ddl_user creates tables in schema public, readonly_user is granted the ability to select on the tables.
ALTER DEFAULT PRIVILEGES FOR ddl_user IN SCHEMA public GRANT SELECT ON TABLES TO readonly_user;
370
In order to drop a user that owns objects, what do you need to do first?
DROP OWNED or REASSIGN OWNED
371
What privileges are required to access objects in a particular database?
* ) pg_hba.conf - ability to authenticate * ) CONNECT privilege on the database (this is default) * ) USAGE privilege on the relevant schema * ) SELECT, UPDATE, INSERT, DELETE, EXECUTE, etc privilege on the object in the schema (grant some_privs on all tables in schema foo to some_user)
372
What is the type of an 8-byte integer, and what is an alias?
bigint (int8)
373
What is the type of a 4-byte auto-incrementing integer?
serial
374
What is the type of an 8-byte auto-incrementing integer?
bigserial
375
What is the type of binary data ("byte array")
bytea
376
What is the type of a 4-byte floating point number?
real (float4)
377
What is the type of an 8-byte floating point number?
double precision (float8)
378
What is the specific type intended to be used for a currency amount?
money
379
What is an exact numeric of selectable precision?
numeric(p,s) (p=precision, the total number of digits, and s=scale, the number of digits in the fractional part)
380
Does PostgreSQL support JSON?
Yes, it has a 'json' data type for storing JSON (JavaScript Object Notation) data
381
What range types does PostgreSQL support?
int4range, int8range, numrange, tsrange, tstzrange, daterange (note: no floating point range)
382
When specifying a column in a CREATE TABLE statement, which comes first, the column name, or the type?
The column name comes first, before the column type.
383
Give an example of an insert statement
insert into departments(dep_id, name) values (1, 'Finance');
384
How can you insert multiple rows with a single INSERT statement?
insert into departments(dep_id, name) values (1, 'Finance'), (2, 'Silly Walks');
385
Give an example of an update statement
update departments set name='development' where dep_id = 1;
386
Give an example of a delete statement
delete from departments where department_id = 2
387
How and why would you use dollar quoting?
$$Don't want no stinkin' single quotes$$ -- or: $foo$Don't want it$foo$
388
What are double quotes used for?
Double quotes are used to delimit database object names that clash with keywords, contain mixed case, or contain special characters (something other than a-z, 0-9, or underscore).
389
What is the SQL concatenation operator?
||
390
Name five types of constraints
``` check constraints not-null constraints unique constraints primary keys foreign keys ```
391
Give the syntax for CREATE SEQUENCE
CREATE [TEMPORARY | TEMP] SEQUENCE name [INCREMENT [ BY ] increment] [ MINVALUE minvalue | NO MINVALUE ] [ MAXVALUE maxvalue | NO MAXVALUE ] [ START [ WITH ] start ] [ CACHE cache ] [ [ NO ] CYCLE ]
392
How do you get the next value from the sequence named myseq?
nextval('myseq')
393
``` What is the sequence value returned by nextval after these commands: CREATE SEQUENCE serial START 101; SELECT nextval('serial'); ```
101
394
What does the nextval function do, and describe its usage
Advances the sequence and returns a new value. The single argument should be the name of the sequence, as a string.
395
What does the currval function do?
currval returns the *most recently used* value for a specific sequence.
396
What is the result of the following two commands: CREATE SEQUENCE serial START 101; SELECT currval('serial');
An error; currval is the most recently used/allocated value for the sequence, so it is undefined until nextval() has been called.
397
What does the setval function do?
Sets the next value to be returned by the sequence.
398
Does PostgreSQL have updatable views?
Not by default; you have to create rules in order to get updatable views.
399
If managers are only supposed to be able to see employee data for their own department, i.e. does it provide row-level security, to define a view that is department specific like this: `create view sales_emp as select * from emp where deptno = 30` and only give managers from deptno 30 the ability to run this view?
No, it is not secure to do so, *unless* the view is create with the "with (security_barrier)" option.
400
Does the optimizer treat a view as a full-fledged query, or a subquery?
Subquery. Don't ask me what this means, though.
401
What are materialized views?
PG 9.3 has materialized views, which are like pre-computed views, with the option to refresh the snapshot of data stored in the materialized view. You could also view materialized views as being like tables that can only be populated with a single query.
402
How do you create a materialized view
CREATE MATERIALIZED VIEW myview AS SELECT blah, blah blah;
403
How to you update the data in a materialized view?
REFRESH MATERIALIZED VIEW myview;
404
What is a lateral subquery?
Subqueries appearing in FROM can be preceded by the key word LATERAL. This allows them to reference columns provided by preceding FROM items. (Without LATERAL, each subquery is evaluated independently and so cannot cross-reference any other FROM item.) LATERAL is primarily useful when the cross-referenced column is necessary for computing the row(s) to be joined. A common application is providing an argument value for a set-returning function.
405
What index types are supported by PostgreSQL?
* B-tree (default) * Hash (not crash safe) * Index on expressions (use when quick retrieval is needed on a frequently used expression) * Partial index (index only rows that satisfy the WHERE clause, which need not include the indexed column; a query must use the same WHERE clause in order to use the partial index) * SP-GiST indexes (space-partitioned GiST supported partitioned search trees)
406
What is the syntax for CREATE INDEX?
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table_name [ USING method ] ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ) [ WITH ( storage_parameter = value [, ... ] ) ] [ TABLESPACE tablespace_name ] [ WHERE predicate ]
407
Is it necessary to wrap the "ON " clause of a JOIN query in parentheses?
No. (However, it *is* necessary to wrap the column argument of "USING " in parentheses in a JOIN query that uses one).
408
What does "TABLE " do?
It is equivalent to "SELECT * FROM "
409
Are OIDs dumped by default by pg_dump? Comment on any relevant command line switches.
No, OIDs are not dumped by default. Use the -o (little o) switch to dump OIDs
410
What does the -j switch to pg_dump do, and what other command line options must be specified in order to make use of it?
You must use the `-Fd` (directory format) switch along with `-j`. E.g.: `pg_dump -Fd -j 4` would dump to directory format using 4 parallel jobs.
411
If you do a no-frills dump of a database, e.g. `pg_dump dbname > dbname.sql`, what is the easiest, most basic way to restore it?
It should be restored into an empty database. If you drop the database dbname and recreate it, then you can run: psql dbname < dbname.sql
412
Are all backups made by pg_dump, including non-text backups, portable across architectures?
Yes, all backups made by pg_dump are portable across architectures.
413
List as many of the access privilege codes (as seen in psql \dp output, for instance) as you can
``` r -- SELECT ("read") w -- UPDATE ("write") a -- INSERT ("append") d -- DELETE D -- TRUNCATE x -- REFERENCES t -- TRIGGER X -- EXECUTE U -- USAGE C -- CREATE c -- CONNECT T -- TEMPORARY ```
414
How do you know when a (point-in-time) recovery is done?
recovery.conf in the data directory will have been renamed to recovery.done
415
If you just want to recover up to the most recent WAL file during a recovery, what do you need to put in recovery.conf?
Just: | restore_command = 'cp /mnt/server/archivedir/%f "%p"' or what have you
416
If you want to recover to a specific timestamp, what do you need to put in recovery.conf?
``` recover_command = 'cp /mnt/server/archivedir/%f "%p"' recover_target_time = ' ```