Basic Administration Flashcards

1
Q

Name some security features

A
  • SSL
    • host-based authentication
    • object-level permissions
    • logging
    • groups & roles
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name some recovery and availability features

A
  • Streaming replication (async and sync), aka hot standby
    • Cascading streaming replication
    • pg_basebackup, hot backup
    • PITR (point in time recovery)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name some cool advanced features

A
  • triggers and functions
    • many procedural languages (pgSQL, Perl, TCL, PHP, Java, Python, etc)
    • custom procedural languages
    • upgrade using pg_upgrade
    • unlogged tables
    • materialized views
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does MVCC stand for

A

Multi-Version Concurrency Control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

List some characteristics of MVCC

A
  • Maintains data consistency.
    • Each transaction sees a snapshot of the database as it was at the beginning of the transaction.
    • Writers block writers; nothing else blocks.
    • Concurrent transactions are isolated.
    • Prevents transactions from seeing inconsistent data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does WAL stand for?

A

Write-Ahead Logs (or Write-Ahead Logging)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does WAL work?

A
  • Records each data change before it actually takes place.
    • Data is not considered ‘safe’ until the log is written to disk.
    • Provides recovery in case of crash or failure.
    • As DML is executed, changes are recorded in memory in WAL buffers (as well as the shared data buffers).
    • When COMMIT is executed, the WAL buffers are flushed to the WAL segment files
    • Later the dirty data buffers are written to disk
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is ACID?

A

Atomicity, Consistency, Isolation, Durability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

List some commonly used pg_dump switches

A
  • a Data only
  • s Definitions (“schema”) only
  • n NAME Dump named schema (“namespace”)
  • t NAME Dump named table
  • Fp Plain format
  • Fc Custom format
  • Fd Directory format
  • Ft Tar format
  • f FILE Dump to named file
  • o Dump OIDs
  • j JOBS Parallelize directory format dumps
  • v Verbose
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name four general methods for backup

A
  • SQL dump (pg_dump, pg_dumpall)
    • Filesystem dump
    • Continuous archiving
    • Streaming replication
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

List two methods (command line examples) for dealing with very large databases when doing SQL backups

A
  • pg_dump … | gzip -c > file.sql.gz

* pg_dump … | split -b 1m - filename

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe two methods for restoring SQL dumps done with pg_dump

A
  • psql < TEXT_FORMAT_DUMP_FILE (or psql -f TEXT_FORMAT_DUMP_FILE)
    • pg_restore NON_TEXT_FORMAT_DUMP_FILE
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Using command defaults for pg_dump, when you restore a backup into a new cluster, is there anything you need to do first?

A

Yes, create the target database. By default, pg_dump does not include the command to create the target database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the –create (-C) switch to pg_dump do?

A

pg_dump --create inserts a command to create and reconnect to the target database before the commands to populate that database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the –clean switch to pg_dump do?

A

pg_dump --clean drops the target database prior to recreating it, when using the –create (-C) switch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

List important pg_restore options

A

pg_restore

  • d DB Connect to the specified database, and, if -C (–create) is not specified, restore into this database also.
  • C Create the database specified in the backup and restore into it
  • a Restore data only (assumes the SQL objects like tables have already been created)
  • s Restore object definitions (DDL) only, not data
  • t NAME Restore named table
  • n NAME Restore named schema/namespace
  • v Verbose
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What command can dump an entire cluster as SQL?

A

pg_dumpall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does pg_dumpall do?

A

pg_dumpall dumps all databases in the cluster, and also global objects such as roles and tablespaces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

List commonly used pg_dumpall switches

A

pg_dumpall

  • a Dump data only
  • s Dump definitions only
  • c Clean (drop) objects before recreating
  • g Dump global objects (roles, etc) only, not databases
  • r Dump roles only, not databases or other global objects
  • O Skip ownership commands
  • x Skip privileges
  • -disable-triggers Disable triggers during data-only restore
  • v Verbose
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How many times does pg_dumpall ask for authentication credentials?

A

Once per database (and more?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Why is it a good idea to use a pgpass file when using pg_dumpall?

A

A pgpass file is a good idea for pg_dumpall because the user’s credentials are requested once per database in the cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

List some characteristics of SQL dumps as a backup mechanism

A
  • Generate a text file containing SQL commands (or a binary representation thereof)
  • pg_dump is the relevant command (or pg_dumpall for the entire cluster)
  • pg_dump does not block readers or writers
  • pg_dump does not operate with special permissions
  • pg_dump dumps are internally consistent and are a snapshot of the database at the time pg_dump begins running
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Next to SQL dump, what is the next most simple backup approach?

A

Simple file system level backup is an alternative to SQL dumps, provided that either 1) the database is shut down during the backup, or 2) the native snapshot feature of the filesystem is used (if available).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Why mightn’t filesystem snapshots work without database downtime, even if your filesystem supports this feature?

A

Filesystem snapshots might not work if the database is spread across multiple filesystems; if not, you would have to stop the database to take the multiple snapshots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Describe the process of restoring a backup made via a filesystem snapshot

A

A filesystem snapshot backup can be restored by 1) copying the backup into place (including the WAL logs) and 2) starting the database, which will go into crash recovery mode and replay WAL logs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How can you take your filesystem snapshot backup so that the restore is as fast as possible?

A

To minimize restoration time of a filesystem snapshot, force a CHECKPOINT right before the snapshot – this will minimize the amount of WAL logs that have to be replayed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Can you use tar to back up a database? If so, describe how.

A

Yes, you can back up a PostgreSQL database by 1) shutting down the database; 2) using tar to copy all of the files in the database cluster; and 3) restarting the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How would you use rsync to backup a database?

A

You can back up a database using rsync as follows: 1) With the database running, use rsync to copy all the files. 2) Shut down the database. 3) Use rsync again to copy the files; this second rsync will create a consistent image of the database and will be quite fast, minimizing downtime.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

After SQL dumps and simple filesystem backups, what is a third , more complicated mechanism for backups?

A

A third backup approach is continuous archiving of WAL logs (combined with a possibly inconsistent filesystem backup, such as produced by tar, even with the database running).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is another name for continuous archiving as used for backup purposes?

A

Online backup

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the primary purpose of WAL logs?

A

The primary purpose of WAL logs is to allow database commits to happen quickly (without the data being fully written to the final data pages) but to prevent loss of information in case of a crash – the WAL logs can be “played” when the database starts up after a crash, thus restoring the physical database to match its logical state at the time of the crash.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is a secondary use to which WAL logs can be put?

A

Online backup – first, get WAL log archiving started, in which full and switched WAL logs are copied to backup storage before being recycled; then take a file system backup while the database is running (tar, rsync, etc). If the database crashes, or if you want to revert the database to a specified point in time (PIT), you can copy the original full backup into place along with the archived WAL files, and start the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How do you enable continuous archiving of WAL files?

A
  • wal_level must be set to archive or hot_standby (as opposed to, say, minimal).
  • archive_mode must be set to on (default is off).
  • archive_command must be defined – a command to copy the WAL files somewhere.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How can you achieve point in time recovery (PITR) with continuous archiving?

A

PITR can be accomplished by having the database replay the WAL files only up to a specified file, not all the way to the last file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How can you use continuous archiving to achieve warm standby?

A

Have a second server loaded with the base backup file (filesystem-level backup), and feed the archived WAL files to this second server. At any time, the WAL files can be replayed on this second server so that the second server can take over from the first with a nearly identical state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Why can’t pg_dump and pg_dumpall be used with online backup via continuous archiving?

A

pg_dump and pg_dumpall produce logical, not physical backups of the database. They don’t capture enough information for the WAL files to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is the default size of WAL segment files?

A

16MB – this can be changed by recompiling PostgreSQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

How are WAL segment files named?

A

Numerically, according to the position in the abstract WAL sequence. I say abstract sequence because there is only a small number of physical files, which are recycled by renaming when a particular WAL file is no longer needed (it has been checkpointed; i.e. the changes it encodes have been reflected to the actual data pages).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What does the simplest useful archive_command value on Unix systems?

A

archive_command = ‘test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What does %f mean in the archive_command?

A

%f is the base file name of the WAL file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What does %p mean in the archive_command?

A

%p is the full path name of the WAL file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How do you specify a literal percent sign in the archive_command?

A

%%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Describe the return value/exist status of the archive_command

A

0 if the file could be copied successfully; otherwise, non-zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What happens if the archive_command starts returning non-zero for every file (perhaps because of network error, unmounted fs, etc)?

A

The pg_xlog directory will continue to fill up with WAL files. If the containing filesystem fills up, PostgreSQL will do a “panic” shutdown – no committed transactions will be lost, however.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Does continuous archiving back up changes made to postgresql.conf, pg_hba.conf, etc?

A

No, you must have another approach to back up changes to the configuration files in the data directory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

If you exclude the PostgreSQL data directory from the server’s normal backup procedures, how can the config files be continuously backed up (since continuous archiving does not do so)?

A

Start PostgreSQL with -D config_dir, where config_dir is a directory outside of the data directory, some place where the normal operating system backup procedures will back the directory up. Then, in postgresql.conf within this directory, use the data_directory parameter to point to the actual data directory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

How could you temporarily turn off WAL archiving without stopping the server?

A

Set archive_command to the empty string and reload the server – WAL files will start to accumulate in the pg_xlog directory, though.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What is pg_basebackup?

A

pg_basebackup is a command for taking a base backup of the live database, to which you would apply archived WAL files in order to recover from a disaster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Describe pg_receivexlog

A
  • streams transaction logs from a running cluster
  • uses streaming replication protocol
  • these files can be used for PITR
  • logs streamed in real time
  • can be used instead of archive command
  • example: pg_receivexlog -h localhost -D /usr/local/pgsql/archive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Describe how the archive_command `cp -i %p /mnt/server/archivedir/%f

A

Uses the Unix file copy (cp) command to copy the source WAL file (%p - path) to the desired destination path (/mnt/server/archivedir/%f). The twist is that the interactive (-i) switch is used, presumably to prevent an already existing file of the same name from being overwritten. “But wait”, you say, “how can this command be interactive? There’s no human involved!” That’s where the redirection from /dev/null comes in – if a file would be overwritten, the prompt is written, and the empty response causes an error (status 1), which is what we would want in this scenario.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Describe the “low level API” for base backup that is an alternative to using the pg_basebackup command

A
  • Connect with psql and issue the command “select pg_start_backup(‘backup label’);”
  • Back up the data directory using tar or rsync, etc.
  • Now issue the SQL command “select pg_stop_backup()”.
    [Note: no label is required for the stop; it’s assumed (or enforced) that there is only one backup happening at a time.]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What CLI command can be used to take a base backup of a live cluster?

A

pg_basebackup

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Can a backup taken by pg_basebackup be used for PITR?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Can a backup taken by pg_basebackup be used for streaming replication?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

What does pg_basebackup do, exactly?

A

It uses the low-level backup API (pg_start_backup(‘label’) and pg_stop_backup()) wrapped around some binary copying mechanism, i.e. it automatically puts the database in and out of backup mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

List important pg_basebackup switches/options

A
  • D - destination for backup
  • F <p> - format: plain or tar
  • X - include log files created during the backup, so the backup is fully usable
  • z - compress with gzip
  • Z - compression level
  • P - enable progress reporting
  • h - host on which cluster is running
  • p - cluster port</p>
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Describe what this command is doing: pg_basebackup -h localhost -D /usr/local/pgsql/backup

A

pg_basebackup is being used to take a base backup of the cluster running on localhost; the files will be written to /usr/local/pgsql/backup/.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Describe the configuration steps required to use pg_basebackup

A
  • Modify pg_hba.conf to add a replication connection, e.g.: “host replication postgres IP_ADDR/32 trust”
  • archive_command = ‘cp -i %p /dest/dir/%f
  • archive_mode = on # requires restart
  • max_wal_senders = 3
    wal_keep_segments = NUM
    wal_level = archive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Describe the steps to performing (Point-In-TIme) Recovery

A
  • Stop the server (if it is running).
  • If you have enough space, keep a copy of the data directory and the transaction logs.
  • Remove all directories and files from the cluster data directory.
  • Restore the database files from the base backup (file system backup).
  • Verify the ownership of restored backup directories (must not be root).
  • Remove any files in pg_xlog/.
  • If you have any unarchived WAL segment files recovered from the crashed cluster, copy them in pg_xlog.
  • Create a recovery command file recovery.conf in the cluster data directory.
  • Start the server.
  • Upon completion of the recovery process, the server will rename recovery.conf to recovery.done.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

List the settings (with types) in the recovery.conf file used for point-in-time recovery

A
restore_command (string) 
# unix: restore_command = 'cp /archive/dir/%f "%p"'
# windows: restore_command = 'copy c:\\archive\\\dir\\"%f" "%p"'
recovery_target_name (string)
recovery_target_time (timestamp)
recovery_target_xird (string)
recovery_target_inclusive (boolean)
recovery_target_timeline (string)
pause_at_recovery_target (boolean)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

Why might one choose to set archive_timeout to something other than the default of 0?

A

When doing WAL archiving, remember that only full (16MB) WAL files are shipped, so if the transaction rate and volume are low, you could be exposed to losing data in case of catastrophe. The archive_timeout forces a WAL log to be shipped after the specified number of seconds has passed, regardless of whether it is full or not. If the value is too low (but not zero), it could lead to WAL bloat in the archive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

Describe what the pg_xlogdump contrib module and how to use it

A
  • pg_xlogdump displays the WAL in human-readable format
  • can give wrong results when the server is running
  • Syntax: pg_xlogdump [[startseg] [endseg]]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

What is the name of the contrib module that can display the WAL in a human-readable format

A

pg_xlogdump

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

What are the two methods for upgrading PostgreSQL?

A
  • ) Old-school: pg_dump and then restore into the new cluster
  • ) New-school: pg_upgrade
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

List important features of pg_upgrade

A
  • Helps upgrade between major releases of PostgreSQL (that would ordinarily require dump and restore)
  • Supports upgrading PG 8.3.X or later to latest version
  • Verifies old and new clusters are binary compatible
  • Option for checking clusters (?)
  • Can do in-place or side-by-side upgrade
  • Side-by-side upgrade requires double storage
  • Can be done with parallel jobs in PG 9.3+
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

List important pg_upgrade options/switches

A
  • b old_bindir
  • B new_bindir
  • d old_datadir
  • D new_datadir
  • p old_port
  • P new_port
  • c check clusters only (no change)
  • j # of jobs
  • k use hard links instead of copying files
  • r retain SQL and log files even after successful completion
  • u user_name
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

What PG architecture component creates execution plans?

A

Planner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

What PG architecture component chooses the most efficient plan for a query?

A

Optimizer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

List some execution plan components

A
row estimates (cardinality)
access method: sequential or index
join method: hash, nested loop, etc
join type and order
sorting and aggregates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

Describe EXPLAIN usage

A
EXPLAIN (options) statement
where option can be one of:
ANALYZE
VERBOSE
COSTS
BUFFERS
TIMING
FORMAT {TEXT|XML|JSON|YAML}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

In the explain output:

seq scan on emp (cost=X..Y rows=R width=W) what are X, Y, R, and W?

A

X=estimated startup cost
Y=estimated total cost
R=estimated rows output
W=estimated average width in bytes of rows output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

Plans are made and costed using table statistics - what are two primary table statistics?

A

Table statistics in pg_class include retuples, the estimated number of rows in a table, and relpages, the estimated number of disk blocks taken up by a table or index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

Are table statistics updated in real time?

A

No, table statistics are updated using ANALYZE or VACUUM ANALYZE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

What tables are the table statistics stored in?

A

pg_class and pg_statistic store the table statistics; pg_stats is a view on pg_statistic that is more commonly used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

What gets analyzed when you run ‘ANALYZE’ (with no options)?

A

The tables and indexes in the current database that the current user owns (is that right?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

Can you ANALYZE just a single table or column

A

Yes - “ANALYZE some_table” or “ANALYZE some_table(some_col)”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

Does autovacuum run ANALYZE

A

Yes, autovacuum runs ANALYZE by default.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

Does a DELETE immediately remove a row?

A

No, DELETE merely marks a row as deleted; the row space can be reused or removed by VACUUM .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

What are four useful functions of VACUUM?

A
  • Can recover or reuse space occupied by obsolete rows (from deletes and updates)
  • Can (via ANALYZE option) update data statistics
  • Updates the visibility map, which speeds up index-only scans
  • Protects against loss of very old data due to transaction ID wraparound
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

What is the visibility map useful for?

A

It makes index-only scans more efficient (and more likely to be chosen by the query planner) - the full heap tuple doesn’t need to be read in order to determine whether an index entry is valid or not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

What does plan VACUUM do (as opposed to VACUUM full)?

A
  • removes dead rows and marks the space available for future reuse.
  • Does not shrink the file except for dead rows at the end of the table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q

What does VACUUM FULL do that plain vacuum does not do?

A
  • More aggressive algorithm (what does this mean?)
  • Rewrites the entire table with no dead space
  • Takes a hell of a lot more time
  • Requires at least twice the disk space of the original copy of the table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

Describe the problem of transaction ID wraparound failure?

A

Each transaction has a 32-bit ID (XID) that is allocated serially, and each row version is marked with the XID of the transaction that created it. A transaction is not allowed to see rows with XIDs that are larger than its own, because these are “in the future”. But because there is a limit to how large XIDs can be (2^32, approx 4 billion), at some point XIDs wrap around to 0, at which point a row with a new, low XID is seen as older than higher XIDs, when actually it is newer. Vacuuming can “freeze” old rows by assigning them a special XID that means “old, visible by all”. This prevents wraparound failure, although I am not sure about the details.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q

How often must a table be vacuumed to avoid transaction ID wraparound failure?

A

At least once every two billion transactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
85
Q

What is the visibility map?

A

The visibility map for a table keeps track of which pages contain only tuples that are visible to all current and future transactions (until the page is modified, anyway). This has two benefits: 1) it helps vacuum avoid looking at pages unnecessarily, and 2) it allows index scans to avoid grabbing the heap tuple merely to determine if the current transaction is allowed to see the tuple for an index entry. The VM is tiny compared to the table/heap, so for very large tables, the cost savings can be significant. This allows “index-only scans” to be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
86
Q

What is the name of the file that contains the visibility map for a table?

A

_vm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
87
Q

How does the visibility map get updated?

A

VACUUM updates it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
88
Q

What can you say about the size of the visibility map of a table?

A

The visibility map is very small, so it is readily cached, and many index entries can be checked for visibility with very little memory or disk I/O.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
89
Q

When do you need to run REINDEX?

A

REINDEX should be run when:

  • an index is corrupted (rare)
  • an index is bloated (many almost pages)
  • a storage parameter for the index (like fillfactor) has been changed
  • an index build with CONCURRENTLY failed, leaving an invalid index
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
90
Q

What is the syntax for REINDEX?

A

REINDEX {INDEX|TABLE|DATABASE|SYSTEM} name [FORCE]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
91
Q

Where is meta-information about tables and other objects stored?

A

The pg_catalog schema is where system information about a database is stored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
92
Q

What sorts of objects are stored in pg_catalog?

A

The following objects are stored in the pg_catalog schema:

  • System tables (like pg_class)
  • System functions (e.g. pg_database_size)
  • System views (pg_stat_activity)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
93
Q

Is pg_catalog always part of the search path?

A

Yes, pg_catalog is effectively part of the search_path.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
94
Q

What psql command lists system information tables and views?

A

The \dS psql command lists tables and views from the pg_catalog schema (in addition to other tables and schemas in the search path).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
95
Q

What are the system catalog tables holding tables, constraints, indexes, triggers, and views?

A

pg_tables, pg_constraint (no s), pg_indexes, pg_trigger (no s), and pg_views - (thanks for the consistency, guys!) These are all views except for pg_constraint.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
96
Q

What is the system function for showing the database to which you are connected?

A

current_database() is the system function for showing the database to which you are connected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
97
Q

What is the system function for showing the first schema in the search path?

A

current_schema() is the system function for showing the first schema in the search path.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
98
Q

What are the system functions for showing the client IP address and port and server address and port for the current connection?

A

inet_client_addr, inet_client_port, inet_server_addr, inet_server_port

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
99
Q

What is the system function for telling how long the cluster has been up?

A

pg_postmaster_start_time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
100
Q

What are the columns in the \dS output?

A

schema, relation name, relation type, and owner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
101
Q

What extra columns are shown by \dS+?

A

size and description (the latter is empty for system objects)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
102
Q

What is the difference between the session_user and current_user/user system functions?

A

session_user() is analogous to the UNIX “real user”, current_user() to the “effective user”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
103
Q

What system function returns the array of schemas in the search path?

A

current_schemas(boolean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
104
Q

What does the boolean parameter to the current_schemas() function do?

A

If true, the schemas implicitly in the search path (usually pg_catalog) are also included. If false, just the schemas from the normal explicit search_path are included.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
105
Q

What is the system function for returning the value of a configuration parameter/variable?

A

current_setting(setting_name_str)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
106
Q

What is the system function for modifying a configuration variable?

A

set_config()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
107
Q

What SQL command is the current_setting() function equivalent to?

A

SHOW

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
108
Q

What SQL command is the set_config() function equivalent to?

A

SET

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
109
Q

What system function cancels the current query of a backend process?

A

pg_cancel_backend(pid) cancels the current query in a backend process - the argument is pid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
110
Q

What system function terminates/kills a backend process?

A

pg_terminate_backend(pid)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
111
Q

What is the usage for the system function set_config()?

A

set_config(setting, new_value, is_local_to_transaction) (if not transaction-specific, then applies to the session).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
112
Q

What is the system function for reloading the PostgreSQL config files?

A

pg_reload_conf() reloads the configuration files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
113
Q

What system function rotates the server’s log file?

A

pg_rotate_logfile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
114
Q

What two functions are used to start and stop online backup?

A

pg_start_backup(label, [fast]) and pg_stop_backup()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
115
Q

What is the function for determining the disk space used by a tablespace?

A

pg_tablespace_size(name_or_oid)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
116
Q

What is the function for determining the disk space used by a database?

A

pg_database_size(name_or_oid)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
117
Q

What is the function for determining the disk space used by a relation, not including indexes and toasted data?

A

pg_relation_size(name_or_oid)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
118
Q

What is the function for determining a relation’s TOTAL size, including indexes and TOAST data?

A

pg_total_relation_size(name_or_oid)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
119
Q

Is the size reported in the size column in the \d+ output the same as pg_relation_size() or pg_total_relation_size()?

A

Neither one!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
120
Q

What system function reports the number of bytes required to store the given operand?

A

pg_column_size(something)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
121
Q

List some file operation functions

A

pg_ls_dir, pg_read_file, pg_stat_file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
122
Q

What system function can be used to return filenames in a specified directory?

A

pg_ls_dir(dir_relative_to_data_dir) - superuser only. E.g.: select pg_ls_dir(‘.’) lists all files in the data directory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
123
Q

What system function can read a file

A

pg_read_file(path) reads a file, one line per row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
124
Q

What psql command can you use to determine a function’s usage?

A

\df func_name

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
125
Q

What system view can you use to see details of open connections and running transactions?

A

pg_stat_activity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
126
Q

What system view can you use to see the list of current_locks being held?

A

pg_locks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
127
Q

What system view can you use to see details on databases?

A

pg_stat_database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
128
Q

What system view can you use to see usage statistics for tables?

A

pg_stat_user_tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
129
Q

What system view can you use to see usage statistics for indexes?

A

pg_stat_user_indexes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
130
Q

What system view can you use to see usage statistics for functions?

A

pg_stat_user_functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
131
Q

Write a query to find the list of schemas currently in your search path

A
-- Show all schemas explicitly in search path:
select current_schemas(False);
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
132
Q

You need to determine the names and definitions of all of the views in your schema. Create a report that retrieves view information: the view name and definition text.

A

select viewname, definition from pg_views where schemaname = ‘edbstore’;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
133
Q

Write a statement in psql to reload the config file.

A

select pg_reload_conf();

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
134
Q

Create of report of all the users who are currently connected. The report must display total session time of all connected users.

A

select usename as user, now()-backend_start as session_time from pg_stat_activity;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
135
Q

You found a user connected to server since long time and decided to gracefully kill its connection. Write a statement to perform this task.

A

select pg_terminate_backend(pid) from pg_stat_activity where usename = ‘blah’;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
136
Q

Write a query to display name and size of all the databases in your cluster. Size must be displayed using a meaningful unit.

A

select datname, pg_size_pretty(pg_database_size(oid)) from pg_database order by pg_database_size(oid);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
137
Q

What does the COPY command do?

A

COPY moves data between tables and file-system files on the database server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
138
Q

What is the SQL command for importing data from a file or program execution into a table, and give basic syntax?

A

COPY table_name FROM ‘filename’
– or:
COPY table_name FROM PROGRAM ‘command’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
139
Q

What does COPY … FROM do?

A

Copies data from a file into a table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
140
Q

Can COPY operate on selected columns in a table?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
141
Q

What is the SQL command for exporting data from a table or a query to a file (give basic syntax)?

A

COPY table_name TO ‘filename’
– or:
COPY table_name TO PROGRAM ‘command’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
142
Q

What is the detailed syntax of COPY TO?

A

COPY { table_name [ ( column_name [, …] ) ] | ( query ) }
TO { ‘filename’ | PROGRAM ‘command’ | STDOUT }
[ [ WITH ] ( option [, …] ) ]
where option can be one of:
FORMAT format_name
OIDS [ boolean ]
FREEZE [ boolean ]
DELIMITER ‘delimiter_character’
NULL ‘null_string’
HEADER [ boolean ]
QUOTE ‘quote_character’
ESCAPE ‘escape_character’
FORCE_QUOTE { ( column_name [, …] ) | * }
FORCE_NOT_NULL ( column_name [, …] )
ENCODING ‘encoding_name’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
143
Q

What is the SQL command for exporting the contents of the emp table to the file /tmp/emp.csv on the database server, using a CSV format, with a header included?

A

COPY emp TO ‘/tmp/emp.csv’ WITH (FORMAT CSV, HEADER);

– Don’t forget the parentheses around options!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
144
Q

How would you use the COPY command on a remote host using psql on that remote host to load data from a file on the local host (pretend that psql isn’t on the local host, or you don’t have permission to connect from the local host). Say the file is called emp.csv, the remote host is remote.host, the DB user and database are edbstore/edbstore, and the target table is emp.

A

cat emp.csv | ssh remote.host “psql -U edbstore edbstore -c ‘copy emp from stdin;’”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
145
Q

What does COPY FREEZE do? Under what circumstances can you use it? What are the caveats? What is the syntax?

A

COPY tablename FROM filename (FREEZE) will freeze the loaded rows. It can only be used if the target table was previously created or truncated in the same transaction. This prevents VACUUM from having to do this freezing at some point in the future. The caveat is that the rows will be visible to all other transactions as soon as they are loaded (before the end of the enclosing transaction) – this is a violation of MVCC.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
146
Q

What SQL command would you use (as the superuser on the database server) to dump the contents of the emp table to a csv file, with column headers and a pipe delimiter?

A

COPY emp TO ‘/tmp/emp.csv’ WITH (FORMAT CSV, HEADER, DELIMITER ‘|’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
147
Q

What SQL command would you use to create a table copyemp with the same structure as the emp table?

A

CREATE TABLE copyemp (LIKE emp);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
148
Q

Name some reliability features of PostgreSQL

A
  • ACID-compliant
  • Supports transactions
  • Supports savepoints
  • Uses Write-Ahead Logging
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
149
Q

Name some features of PostgreSQL that increase scalability

A
  • MVCC (connection scalability)
  • Table partitioning (size)
  • Tablespaces (size)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
150
Q

What is the maximum database size in PostgreSQL?

A

unlimited

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
151
Q

What is the maximum table size?

A

32 TB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
152
Q

What is the maximum row size?

A

1.6 TB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
153
Q

What is the maximum field size?

A

1 GB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
154
Q

What is the maximum rows per table?

A

Unlimited

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
155
Q

What is the maximum columns per table?

A

250-1600, depending on column types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
156
Q

What is the maximum indexes per table?

A

Unlimited

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
157
Q

What is the PostgreSQL term for a table or index?

A

Relation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
158
Q

What is a “relation” in commercial database terminology?

A

A table or index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
159
Q

What is the PostgreSQL term for a row?

A

Tuple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
160
Q

What is a “tuple” in commercial database terminology

A

Row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
161
Q

What is the PostgreSQL term for a column

A

Attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
162
Q

What is an “attribute” in commercial database terminology?

A

Column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
163
Q

What was the academic precursor to PostgreSQL?

A

Ingres

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
164
Q

At what institution was PostgreSQL first created?

A

UC Berkeley

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
165
Q

What commercial database products trace their lineage to PostgreSQL or Ingres?

A

SQL Server, Informix, Ingres

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
166
Q

What is the postmaster?

A

The postmaster listens for connections from clients and spawns new a new backend process to handle each connection. The postmaster manages these backend processes as well as other background utility processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
167
Q

Does PostgreSQL use threads?

A

No, it uses processes

168
Q

What are three major components of shared memory in PostgreSQL?

A
  • Shared buffers (data buffers)
  • WAL buffers
  • Process array
169
Q

What are seven background utility processes managed by the postmaster?

A
  • bgwriter
  • stats collector
  • checkpointer
  • archiver
  • autovacuum
  • log writer
  • WAL writer
170
Q

What are four major categories of disk files used by PostgreSQL?

A
  • Data files and friends (indexes, visibility map)
  • WAL segments
  • Archived WAL
  • Error/diagnostic log files
171
Q

What does the background writer do?

A

Writes dirty data blocks to disk when room is needed for more blocks in shared memory.

172
Q

What background process writes dirty data blocks to disk?

A

background writer (bgwriter)

173
Q

What does the WAL writer process do?

A

Flushes write-ahead log to disk

174
Q

What flushes write-ahead log data to disk?

A

The WAL writer process

175
Q

What background process automatically performs a checkpoint (ensures that dirty data blocks are written) at certain intervals or other configured conditions

A

The checkpointer process

176
Q

What does the checkpointer process do?

A

It performs checkpoints (syncing of dirty data blocks to disk) at intervals or otherwise according to configuration parameters

177
Q

Is there one autovacuum process?

A

There is one autovacuum launcher process, which launches multiple autovacuum workers processes

178
Q

What starts autovacuum worker processes?

A

The autovacuum launcher process

179
Q

What does the autovacuum launcher process do?

A

Launches autovacuum worker processes

180
Q

What do autovacuum worker processes do?

A

Recover free space for reuse

181
Q

What processes recover free space for reuse?

A

Autovacuum worker processes

182
Q

What background process routes log messages to syslog, eventlog, or log files

A

Logging collector

183
Q

What does the logging collector process do?

A

Routes log messages to syslog, eventlog or log files

184
Q

What background process collects usage statistics by relation and block?

A

Stats collector

185
Q

What does the stats collector background process do?

A

Collects usage statistics by relation and block

186
Q

What archives write-ahead log files

A

Archiver

187
Q

What does the archiver process do?

A

Archives write-ahead log files in pg_xlog when full (e.g. copies them to a mounted SAN share).

188
Q

What are the mechanisms for communication between the postmaster and its subordinate processes?

A

Shared memory and semaphores

189
Q

On what bases can a connection be authenticated?

A

IP address, user, password, key

190
Q

What does authorization consist of?

A

Verifying permissions in the database

191
Q

What is the PostgreSQL buffer cache called?

A

Shared buffers

192
Q

What is the purpose of the shared buffers?

A

To read OS and disk reads

193
Q

When are shared buffer blocks written to disk?

A

Shared buffer blocks are written to disk only when needed:

1) to make room for new block
2) at checkpoint time

194
Q

Describe the life history of write-ahead logging data

A

When DML is executed to change data, the changes are made to the data blocks in shared memory and also (in a different form) to WAL buffers in shared memory. The WAL writer process flushes WAL buffers to WAL segment files on disk (the “transaction log”) periodically, or on commit, or when the WAL buffers are full. As of 9.2, there is a group commit feature which attempts to batch together WAL-writing from multiple commits that occur nearly at the same time.

195
Q

Describe the state of data changes before commit, after commit, and after checkpoint

A
  • Before commit: changes are stored in memory in the shared data buffers and also in the WAL buffers. (It is possible under conditions of high activity and/or tight memory that changes may be forced out to WAL files and data files).
  • After commit: changes have been written to write-ahead log files on disk (but not necessarily to the data files).
  • After checkpoint: changes have been written from the shared buffers to data files.
196
Q

What are the three main stages of statement processing?

A

Parsing, optimizing, and execution

197
Q

What are the components of statement parsing?

A
  1. Syntax check
  2. Call Traffic Cop (what is that?)
  3. Identify query type
  4. Command processor if needed
  5. Break query in tokens
198
Q

What are the components of statement optimization?

A
  1. Planner generates plans using database statistics
  2. Query cost calculation
  3. Choose best plan
199
Q

What are the components of statement execution (after parsing and optimization are done)?

A

Haha, fooled you. Just one step: execution.

200
Q

What is a database cluster?

A

A cluster is a collection of one or more databases managed by one server instance

201
Q

Complete the sentence: if there are multiple clusters on a single computer, each one will have a different ______, ______, and ________.

A

Each cluster has a separate:

  • data directory
  • TCP port
  • set of processes
202
Q

What are the items that you will find in the /data directory by default?

A

PG_VERSION A file containing the major version number of PostgreSQL

base - Subdirectory containing per-database subdirectories

global - Subdirectory containing cluster-wide tables, such as pg_database

pg_clog - Subdirectory containing transaction commit status data

pg_multixact - Subdirectory containing multitransaction status data (used for shared row locks)

pg_notify - Subdirectory containing LISTEN/NOTIFY status data

pg_serial - Subdirectory containing information about committed serializable transactions

pg_snapshots - Subdirectory containing exported snapshots

pg_stat_tmp - Subdirectory containing temporary files for the statistics subsystem

pg_subtrans - Subdirectory containing subtransaction status data

pg_tblspc - Subdirectory containing symbolic links to tablespaces

pg_twophase - Subdirectory containing state files for prepared transactions

pg_xlog - Subdirectory containing WAL (Write Ahead Log) files

postmaster. opts - A file recording the command-line options the server was last started with
postmaster. pid A lock file recording the current postmaster process ID (PID), cluster data directory path, postmaster start timestamp, port number, Unix-domain socket directory path (empty on Windows), first valid listen_address (IP address or *, or empty if not listening on TCP), and shared memory segment ID (this file is not present after server shutdown)

Usually: postgresql.conf

Usually: pg_hba.conf

Usually: pg_ident.conf

203
Q

What does the data/global subdirectory contain?

A

Cluster-wide system tables, like pg_database

204
Q

What does the data/base subdirectory contain?

A

Per-database subdirectories for databases having data in the default tablespace

205
Q

What does the data/pg_xlog subdirectory contain?

A

Write Ahead Log files

206
Q

What does the data/pg_clog subdirectory contain?

A

Transaction commit status data

207
Q

What does the data/pg_subtrans subdirectory contain?

A

Subtransaction commit status data

208
Q

What does the pg_multixact subdirectory contain

A

Multitransaction status data (used for shared row locks)

209
Q

What does the pg_notify subdirectory contain?

A

LISTEN/NOTIFY status data

210
Q

What does the data/pg_serial subdirectory contain?

A

Information about committed serializable transactions

211
Q

What does the data/pg_snapshots subdirectory contain?

A

Exported snapshots

212
Q

What does the data/pg_stat_tmp subdirectory contain?

A

Temporary files for the statistics subsystem

213
Q

What does the data/tblspc subdirectory contain?

A

Symbolic links to tablespaces

214
Q

What is data/postmaster.opts?

A

A file recording the command-line options the server was last started with

215
Q

postmaster.pid

A

A lock file recording the current postmaster process ID (PID), cluster data directory path, postmaster start timestamp, port number, Unix-domain socket directory path (empty on Windows), first valid listen_address (IP address or *, or empty if not listening on TCP), and shared memory segment ID (this file is not present after server shutdown)

216
Q

What does the pg_twophase subdirectory contain?

A

State files for prepared transactions

217
Q

Describe the contents of the data/base directory

A

One directory per database in the cluster, the directory being named with the OID of the database. This is the default location for the cluster’s databases and files, and the system catalogs are stored here at a minimum

218
Q

Describe the contents of the data/base/NNN directory, where NNN is the OID of some database

A

One or more files for each table or index in the database. For ordinary relations, these files are named after the table or index’s filenode number, which can be found in pg_class.relfilenode. But for temporary relations, the file name is of the form tBBB_FFF, where BBB is the backend ID of the backend which created the file, and FFF is the filenode number. In either case, in addition to the main file (a/k/a main fork), each table and index has a free space map, which stores information about free space available in the relation. The free space map is stored in a file named with the filenode number plus the suffix _fsm. Tables also have a visibility map, stored in a fork with the suffix _vm, to track which pages are known to have no dead tuples. Unlogged tables and indexes have a third fork, known as the initialization fork, which is stored in a fork with the suffix _init.

219
Q

If a database has multiple tablespaces, where do those tablespaces go?

A

Each tablespace is a directory. base is the default tablespace. All other tablespaces can be located anywhere, but a soft link to each tablespace must be placed in data/tblspc/.

220
Q

Can a database have files in multiple tablespaces, and if so, what is the physical manifestation of this in the filesystem?

A

Yes, a database can have files in multiple tablespaces. Each tablespace has a subdirectory for each database that has files in the tablespace, so the OID of the tablespace may appear in multiple tablespace directories.

221
Q

What is the maximum size for a physical file representing a table or index?

A

1GB

222
Q

Given the maximum size for a physical table/index file, how is it that the maximum table size is 32 TB? Describe how files for tables/indexes are named

A
  • A table or index is stored in one or more physical files.
  • For non-temporary relations, the first such file is named as the relation’s file node number (pg_class.relfilenode), and subsequent files for the same relation are named _N where N is a serial number.
  • For temporary relations, the file name is of the form tBBB_FFF, where BBB is the backend ID of the backend which created the file, and FFF is the filenode number
223
Q

In addition to the data files for a table, what other files for the table may there be, and how are they named?

A
  • Visibility map: RELFILENODE_vm
  • Free space map: RELFILENODE_fsm
  • (For unlogged relations): initialization fork: RELFILENODE_init
224
Q

List the contents of the PostgreSQL installation directory

A
bin - programs
data - data directory
doc - documentation
include - header files
installer, scripts - installer files (EDB)
lib - libraries
pgAdmin III (EDB)
StackBuilder (EDB)
pg_env.{bat,sh}
225
Q

What are the five main components of a relation data page?

A
  • Page header - approx 24 bytes long; pointer(s) to free space in the page; general info
  • Row/index pointers - array of offset/length pairs pointing to row or index entries later in the page
  • Free space - unallocated space. New pointers are allocated from the front , new rows/index entries from the rear
  • Row/Index entries - actual row or index entries
  • Special - index access method-specific data (empty in regular tables) [Really? Empty?]
226
Q

Does Windows have locked user accounts?

A

No; a password is required

227
Q

What are the three installation options?

A
  • EDB One-Click Installer
  • OS system package (RPM/YUM, Debian/Ubuntu DEB, FreeBSD port, Solaris package, Mac OS X Homebrew
  • Source code
228
Q

What are five primary environment variables that affect the running of PostgreSQL commands, including starting the server. (Hint: the first one is not PostgreSQL-specific).

A

PATH - should include the correct PG bin directory
PGDATA - points to data cluster directory
PGPORT - point to port on which cluster is running
PGUSER - default database user name
PGDATABASE - default database

229
Q

How do you set environment variables in UNIX systems?

A

Edit your shell .profile or .bash_profile

230
Q

How do you set environment variables in Windows

A

Use the Windows My Computer properties page

231
Q

What command is used to create a PostgreSQL cluster?

A

initdb

232
Q

What is initdb?

A

initdb creates a database cluster’s data directory

233
Q

What is the syntax for initdb

A

initdb –D

  • a - specifies the authentication method for local users
  • D - Database cluster directory
  • U - Select the database super user name
  • E - Specify the database encoding
  • k –data-checksums - Use checksums on data pages to help detect corruption
  • W – prompt for superuser password
  • X, –xlogdir=XLOGDIR location for the transaction log directory
234
Q

What files should be edited after creating a new database cluster?

A

postgtesql.conf - to set the correct listening address and port, and to set appropriate configuration in general
pg_hba.conf - to define what users should be able to connect to which databases from which IP addresses

235
Q

What PG command is usually used to start and stop the cluster?

A

pg_ctl

236
Q

Describe the pg_ctl subcommands

A

−pg_ctl initdb [] - creates a new PostgreSQL database cluster
−pg_ctl start [] - Start the server
−pg_ctl stop [] - Stop the server
−pg_ctl restart [] - Restart the server
−pg_ctl status [] - Display server status
−pg_ctl reload [] - Reload configuration file
−pg_ctl promote [-D DATADIR] – Promote Standby to be Primary
−pg_ctl kill signal_name process_id – send a signal(ABRT HUP INT QUIT TERM USR1 USR2) to a process
−Pg_ctl register|unregister - a system service on Microsoft Windows

237
Q

What pg_ctl switch/option is particularly useful when stopping or restarting the server? Describe its usage

A

−-m smart (the defaults) waits for all clients to exit
−-m fast rolls back active transactions, closes open connections, and shuts down cleanly
−-m immediate performs an immediate, abnormal shutdown (i.e. a crash)

238
Q

Besides, -m, what are other useful pg_ctl options/switches?

A

−-D to specify an alternate cluster location
−-l to specify an alternate log file, when starting the server
−-c, –core-files allow postgres to produce core files
Starting and Stopping the Server (pg_ctl)

239
Q

Inside a psql session, how can you change the database connection?

A

\c [DBNAME [USERNAME]]

240
Q

What are six common connection settings?

A
  • listen_addresses (default localhost) - IP addresses to listen on; ‘*’ means all
  • port (default 5432) - port to listen on
  • max_connections (default 100) - max concurrent connections
  • superuser_reserved_connections (default 3) - number of connections reserved for superusers (out of the defined max_connections)
  • unix_socket_directory (default /tmp) - directory to be used for UNIX socket connections
  • unix_socket_permissions (default 0777) - access permissions of the UNIX-domain socket
241
Q

List seven security and authentication settings (hint: six have to do with ssl)

A
  • authentication_timeout (default 1 minute)
  • ssl (default: off) - enable SSL connections
  • ssl_ca_file - SSL certificate authority file
  • ssl_cert_file - SSL certification
  • ssl_key_file - SSL private key
  • ssl_ciphers - list of eligible SSL ciphers
  • ssl_renegotiation_limit (default 512 MB) - how much data can flow through the connection before renegotiation occurs
242
Q

List five memory settings

A
  • shared_buffers (default: <=128MB) - size of shared buffer pool; rule of thumb: 25% of system memory to a max of 8GB on Linux, or 512 MB on Windows
  • temp_buffers (default: 8 MB) - amount of memory used by each backend for caching temp table data
  • work_mem (default: 1MB) - amount of memory used for each sort or hash operation before switching to temporary disk files
  • maintenance_work_mem (default: 16 MB) - amount used for each index build or VACUUM
  • temp_file_limit (default: -1) - amount of disk space that a session can use for temporary files. Default is unlimited. Attempting to exceed the limit will abort a transaction.
243
Q

What is a rule of thumb for setting shared_buffers on Windows?

A

25% of system memory up to a maximum of 512 MB.

244
Q

What is a rule of thumb for setting shared_buffers on UNIX systems?

A

25% of system memory, up to a maximum of 8 GB.

245
Q

What is shared_buffers?

A

Memory setting controlling the size of the shared data buffer cache. Default is <= 128 MB.

246
Q

What is temp_buffers?

A

temp_buffers (default: 8MB): Amount of memory used by each backend for caching temporary table data.

247
Q

What is work_mem?

A

work_mem (default: 1MB): Amount of memory used for each sort or hash operation before switching to temporary disk files. Default is conservative, but don’t overdo it.

248
Q

What is maintenance_work_mem?

A

maintenance_work_mem (default: 16MB): Amount of memory used for each index build or VACUUM.

249
Q

What is temp_file_limit?

A

temp_file_limit (default -1): amount of disk space that a session can use for temporary files. A transaction attempting to exceed this limit will be cancelled. Default is unlimited.
Memory Settings

250
Q

What are three commonly used query planner settings? In addition to these, comment on the enable_* settings that also affect query planning.

A
  • random_page_cost (default 4.0): Estimated cost of a random page fetch, in abstract cost units. May need to be reduced to account for caching effects.
  • seq_page_cost (default 1.0): Estimated cost of a sequential page fetch, in abstract cost units. May need to be reduced to account for caching effects. Must always set random_page_cost >= seq_page_cost.
  • effective_cache_size (default 128M): Used to estimate the cost of an index scan. Rule of thumb is 75% of system memory.
  • There are plenty of enable_* parameters which influence the planner in choosing an optimal plan. For example:
  • enable_indexonlyscan enables or disables the query planner’s use of index-only-scan plan types
251
Q

What is random_page_cost?

A

random_page_cost (default 4.0): Estimated cost of a random page fetch, in abstract cost units. May need to be reduced to account for caching effects.

252
Q

What is seq_page_cost?

A

seq_page_cost (default 1.0): Estimated cost of a sequential page fetch, in abstract cost units. May need to be reduced to account for caching effects. Must always set random_page_cost >= seq_page_cost.

253
Q

What is effective_cache_size?

A

effective_cache_size (default 128M): Used to estimate the cost of an index scan. Rule of thumb is 75% of system memory.

254
Q

List five Write-Ahead Log settings

A
  • wal_level
  • fsync
  • wal_buffers
  • checkpoint_segments
  • checkpoint_timeout
255
Q

What is wal_level?

A

wal_level (default: minimal). Determines how much information is written to the WAL. Change this to enable WAL archiving/replication. Other values are archive and hot_standby.

256
Q

What is fsync?

A

fsync (default on): Turn this off to make your database much faster – and silently cause arbitrary corruption in case of a system crash.

257
Q

What is wal_buffers?

A

wal_buffers (default: -1, autotune): The amount of memory used in shared memory for WAL data. The default setting of -1 selects a size equal to 1/32nd (about 3%) of shared_buffers

258
Q

What is checkpoint_segments?

A

checkpoint_segments (default 3): Maximum number of 16MB WAL file segments between checkpoints. Default is too small!

259
Q

What is checkpoint_timeout?

A

checkpoint_timeout (default 5 minutes): Maximum time between checkpoints.

260
Q

What are seven settings controlling where to log error/diagnostic messages?

A
  • log_destination
  • logging_collector
  • log_directory
  • log_filename
  • log_file_mode
  • log_rotation_age
  • log_rotation_size
261
Q

What are the possible value of log_destination?

A

log_destination. Valid values are combinations of stderr, csvlog, syslog, and eventlog, depending on platform.

262
Q

What is the format convention for log_filename and what is an example?

A

strftime (but system strftime is not used so you can’t use local extensions). postgresql-%Y-%M-%d.log

263
Q

List four settings controlling when to log

A
  • client_min_messages (default NOTICE). Messages of this severity level or above are sent to the client.
  • log_min_messages (default WARNING). Messages of this severity level or above are sent to the server.
  • log_min_error_statement (default ERROR). When a message of this severity or higher is written to the server log, the statement that caused it is logged along with it.
  • log_min_duration_statement (default -1, disabled): When a statement runs for at least this long, it is written to the server log, with its duration.
264
Q

List eight settings controlling what to log

A
  • log_connections (default off): Log successful connections to the server log.
  • log_disconnections (default off): Log some information each time a session disconnects, including the duration of the session.
  • log_error_verbosity (default “default”): Can also select “terse” or “verbose”.
  • log_duration (default off): Log duration of each statement.
  • log_line_prefix: Additional details to log with each line.
  • log_statement (default none): Legal values are none, ddl, mod (DDL and all other data-modifying statements), or all.
  • log_temp_files (default -1): Log temporary files of this size or larger, in kilobytes.
  • log_checkpoints (default off): Causes checkpoints and restartpoints to be logged in the server log.
265
Q

What does log_temp_files do?

A

Logs temporary files of this size or larger, in kilobytes

266
Q

Describe three background writer settings

A
  • bgwriter_delay
  • bgwriter_lru_maxpages
  • bgwriter_lru_multiplier
267
Q

Describe the bgwriter_delay setting.

A

bgwriter_delay (default 200 ms): Specifies time between activity rounds for the background writer.

268
Q

Describe the bgwriter_lru_maxpages setting

A

bgwriter_lru_maxpages (default 100): Maximum number of pages that the background writer may clean per activity round.

269
Q

Describe the bgwriter_lru_multiplier setting

A

bgwriter_lru_multiplier (default 2.0): Multiplier on buffers scanned per round. By default, if system thinks 10 pages will be needed, it cleans 10 * bgwriter_lru_multiplier of 2.0 = 20.

270
Q

What is the primary background writer tuning technique

A

The primary background writer tuning technique is to lower the bgwriter_delay.

271
Q

Describe the search_path setting

A

search_path specifies the order in which schemas are searched. The default value for this parameter is: “$user”, public

272
Q

What setting specifies the order in which schemas are searched?

A

search_path

273
Q

What is the default_tablespace setting?

A

default_tablespace is the name of the tablespace in which objects are created by default

274
Q

What is the temp_tablespaces setting?

A

temp_tablespaces holds the tablespace name(s) in which objects are created by default. (The temp table load is evenly spread across the tablespaces in this list).

275
Q

What is the statement_timeout setting?

A

Any statement that takes more than the specified number of milliseconds will be aborted. The default value is 0 (no maximum statement time).

276
Q

What are some vacuum cost settings?

A
vacuum_cost_delay
vacuum_cost_page_hit
vacuum_cost_page_miss
vacuum_cost_page_dirty
vacuum_cost_limit
277
Q

What is the vacuum_cost_delay setting?

A

vacuum_cost_delay is the length of time, in milliseconds, that the process will wait when the cost limit it exceeded. Default for manual VACUUM is 0, so if you want a low-impact manual vacuum, you should set this to a non-zero value (autovacuum uses 20 ms ….)

278
Q

What is the autovacuum setting?

A

autovacuum (default on) turns on or off autovacuuming

279
Q

What is the log_autovacuum_min_duration setting, and what is the default?

A

Autovacuum tasks running longer than this duration (in milliseconds) are logged. -1 is the default, which disables this logging.

280
Q

What is the autovacuum_max_workers setting?

A

autovacuum_max_workers (default 3) is the max number of autovacuum worker processes that may be running at one time

281
Q

What is the setting to define the maximum number of autovacuum worker processes that may be running at once?

A

autovacuum_max_workers

282
Q

What is the name of the setting to disable or enable autovacuum?

A

autovacuum!

283
Q

What is the syntax for including a file in a configuration file?

A

include ‘filename’

284
Q

What is the syntax for including a directory of configuration files in a configuration file? Describe how this works.

A

include_dir ‘somedir’. All files named *.conf will be included from the named directory, in C locale filename order. Thus, you could name the files 00_foo.conf, 01_bar.conf, 02_baz.conf to control the order of loading while still having the names be meaningful.

285
Q

What is the setting to reserve connection slots for the DBA?

A

superuser_reserved_connections

286
Q

What setting allows you to log statements taking longer than the specified number of milliseconds?

A

log_min_duration_statement

287
Q

How would you configure the server to accept up to 200 connected users?

A

In postgresql.conf, set: max_connections = 200

and restart the server

288
Q

How would you configure the server to reserve 10 connection slots for the DBA?

A

In postgresql.conf, set: superuser_reserved_connections = 10

and restart the server

289
Q

How would you set the maximum time to complete client authentication to 10 seconds?

A

In postgresql.conf set: authentication_timeout = 10s

and reload the server

290
Q

How would you configure the server to save all the error messages in a file inside the pg_log folder in your cluster data directory?

A

Set logging_collector to on

291
Q

How would you log all queries and their time which are taking more than 5 seconds to execute?

A

Set log_min_duration_statement to 5000

292
Q

How would you log the users who are connecting to the database cluster?

A

Set log_connections to on and set the log_line_prefix to something include ‘%u’

293
Q
Change following autovacuum parameter on the production server:
−Autovacuum workers to 6
−Auto Vacuum threshold to 100
−Auto Vacuum scale factor to 0.3
−Auto Analyze threshold to 100
−Auto vacuum cost limit to 100
A
set autovacuum_max_workers to 6
set autovacuum_vacuum_threshold to 100
set autovacuum_vacuum_scale_factor to 0.3
set autovacuum_analyze_threshold to 100
set autoavacuum_cost_limit to 100
294
Q

In the PG object hierarchy, what are the three objects under “Database Cluster”

A

A database cluster contains:

  • Roles (users, groups)
  • Tablespaces
  • Databases
295
Q

What are the top-level objects within a database?

A

A database contains:

  • catalogs
  • extensions
  • schemas
296
Q

What are the objects within a schema?

A
  • tables
  • views
  • sequences
  • functions
  • event triggers
297
Q

How do you determine the list of databases in a cluster using SQL?

A

select datname from pg_database;

298
Q

How do you get the list of databases in a cluster using a PSQL meta command?

A

\l (lower-case L)

299
Q

What is the syntax of the SQL command for creating databases?

A

CREATE DATABASE name [ [ WITH ] [ OWNER [=] dbowner ] [ TEMPLATE [=] template ] [ ENCODING [=] encoding ] [ TABLESPACE [=] tablespace ] [ CONNECTION LIMIT [=] connlimit ] ]

300
Q

How do you create a schema named fooz owned by the user foozer?

A

create schema fooz authorization foozer;

301
Q

What is the default search_path?

A

“$user”, public

302
Q

What is the “current schema”

A

The current schema is the first schema in the search_path

303
Q

How does PG use the search_path

A

When trying to resolve object names (tables, etc), PG searches the schemas in the search_path in order. When creating a table, by default the table is created in the “current schema” – the first schema in the search_path.

304
Q

What is the only real difference between a user and a group?

A

Users are roles that can log into any database; groups are roles that can NOT log into any database.

305
Q

Do the following things in psql: 1) create the user ‘fred’ with password ‘flinstone’; 2) create a database ‘fred’ owned by ‘fred’; 3) connect to the ‘fred’ database as ‘fred’ and create the ‘fred’ schema; 4) log out.

A
> create user fred with password 'flinstone';
> create database fred with owner fred;
> \c fred fred
> create schema fred;
> \dn
> \q
306
Q

How do you retrieve a list of databases using SQL?

A

select datname from pg_database;
– Advanced: list databases with owners:
select d.datname, u.usename from pg_database d join pg_user u on (d.datdba = u.usesysid);

307
Q

How do you list databases using a psql metacommand?

A

\l (lower case L)

308
Q

How would you retrieve a list of tables in a specified database, including schema and owner?

A

Two methods:

  1. (psql meta-command) \d+
  2. (SQL) select schemaname, tablename, tableowner from pg_tables
309
Q

Describe the use of double-quotes in psql

A

Double-quotes are used to specify an exact name, preserving case

310
Q

Describe psql’s strange and redundant way of specifying database and user on the command line

A

There are two different and equivalent usages of psql:
1) psql [DBNAME [USER]]
and
2) psql -U USER DBNAME

311
Q

When invoking psql from the command line, what are the default values for USER and DBNAME?

A

If the PGUSER and PGDATABASE environment variables are not defined, the default values for USER and DBNAME are the name of the operating system user. If PGUSER and PGDATABASE are defined, those values are used. If just PGUSER is defined, that value is also the default database name.

312
Q

What psql meta-command is used to re-connect to a different database, and/or as a different user?

A

\c DBNAME [USER]

313
Q

If, from within psql, you want to reconnect to the same database as a different user, what is the short-hand syntax for that?

A

\c - NEWUSER

– (note the dash character)

314
Q

What psql meta-command is used to re-connect to a different database, and/or as a different user?

A

\c DBNAME [USER]

315
Q

If, from within psql, you want to reconnect to the same database as a different user, what is the short-hand syntax for that?

A

\c - NEWUSER

– (note the dash character)

316
Q

If the -h and -p switches are not used when invoking psql, how are the host and port determined?

A

1) If the PGHOST and PGPORT environment variables are defined, they are used. Otherwise, on UNIX systems, a local UNIX socket connection is attempted, or, on Windows, a local TCP connection.

317
Q

How can you have psql meta-commands executed whenever you connect to a database with psql?

A

psql always runs commands in ~/.psqlrc unless -X is specified

318
Q

How do you show command history in psql?

A

\s

319
Q

What does the \s command do in psql?

A

Shows or saves the command history

320
Q

How do you edit the query buffer in psql?

A

\e

321
Q

What does \e do in psql?

A

Edits (and then executes) the query buffer

322
Q

How do you save the query buffer in psql?

A

\w FILENAME

323
Q

What does \w do in psql?

A

Saves the query buffer (to the specified FILENAME)

324
Q

How do you set variables in psql?

A

Two methods.

1) (Command line): -v NAME=VALUE
2) (Inside psql): \set NAME VALUE

325
Q

How do you use a variable in psql?

A
Three ways:
1) Unadorned (numbers)
\:NAME
\set NAME 10
select :NAME;
2) Quoted strings
\:'NAME'
\set NAME testing
select :'NAME';
3) Identifiers
\:"NAME"
\set NAME empno
select :"NAME" from emp;
326
Q

How do you send query output from psql to a file (or pipe)

A
-o FILENAME or \o FILENAME (FILENAME may be a pipe)
Example:
\o | grep 4
select empno from emp;
  7499
  7654
  7844
  7934
(14 rows)
327
Q

How do you use a variable in psql?

A
Three ways:
1) Unadorned (numbers)
\:NAME
\set NAME 10
select :NAME;
2) Quoted strings
\:'NAME'
\set NAME 'testing'
select :'NAME';
3) Identifiers
\:"NAME"
\set NAME 'empno'
select :"NAME" from emp;
328
Q

How do you send query output from psql to a file (or pipe)

A

-o FILENAME or \o FILENAME (FILENAME may be a pipe)

329
Q

How do you execute the query buffer in psql?

A

\g [FILENAME] (filename may be a pipe)

330
Q

How do you output only tuples in psql, and what does that mean, actually?

A

“Tuples only” mode means that column headings are not output, and the final count of rows is not output. Within psql, \t toggle tuples-only output. On the psql command line, -t turns on tuples-only mode. An alternative to \t is \pset tuples_only

331
Q

What does \x do in psql?

A

Toggles expanded output, where columns are output as rows.

332
Q

How do you output only tuples in psql, and what does that mean, actually?

A

“Tuples only” mode means that column headings are not output, and the final count of rows is not output. Within psql, \x

333
Q

What does \echo do in psql, and does it have any options?

A

\echo [-n] [string]
Prints a string on STDOUT, followed by a newline, unless the -n switch is used, which suppresses the newline. The output of \echo is not affected by -o or \o.

334
Q

What does \qecho in psql do?

A

\qecho is like \echo, but its output is redirected by -o or \o.

335
Q

What psql command lists databases?

A

\l

336
Q

What are the attributes listed by \l for each database?

A

name, owner, encoding, collation, ctype, access privileges

337
Q

What additional attributes are listed by \l+ beyond those produced by \l?

A

size, tablespace, and descripion

338
Q

What does \dn do?

A

Lists schemas (namespaces)

339
Q

What attributes are included in \dn output?

A

name and owner

340
Q

What attributes are included in \dn+ output?

A

name, owner, access privileges, and description

341
Q

What does \df do?

A

List functions

342
Q

What attributes are included in \df output?

A

schema, name, result data type, argument data types, type

343
Q

What attributes are included in \df+ output?

A

Same as \df, but in addition: security, volatility, owner, language, source code

344
Q

\d(i,s,t,v,S)[+] [pattern]

A

List info about indexes, sequences, tables, views, or System objects; any combination of letters is possible

345
Q

What does \drds do in psql?

A

Lists per-database role settings

346
Q

What does \dp do in psql?

A

List access privileges (for tables, views, and sequences, by default)

347
Q

What does \ef do in psql?

A

This command fetches and edits the definition of the named function (Can also be used to create new functions - just don’t supply a name, and your editor will be invoked on a function definition template).

348
Q

What is an easy way to change the user’s password in psql?

A

\password

349
Q

In psql, how do you describe a table including descriptions of columns?

A

\d+ TABLENAME

350
Q

What does \d+ give you that \d doesn’t?

A

\d+ also gives you storage, stats target, and description

351
Q

How do you list schemas in psql?

A

\dn

352
Q

How do you list tablespaces in psql?

A

\db

353
Q

How do you list tables in psql?

A

\dt

354
Q

How do execute a SQL statement in psql, saving the output to a file.

A
Two ways:
1. With \g ("one-shot \o"):
select * from emp \g FILENAME
2. With \o:
\o FILENAME
select * from emp;
\o
3. From command line:
psql -U edbstore -o /tmp/emp.dat -c "select * from emp" edbstore
355
Q

In psql, execute a statement, saving just the data to a file, not the column headers.

A
\t
\o FILENAME
select * from emp;
\o
\t
356
Q

How do you execute a script with psql?

A

Three ways:

  1. \i FILENAME
  2. Command line: psql … < FILENAME
  3. Command line: psql … -f FILENAME …
357
Q

In psql, how do you list tables, views and sequences with their associated access privileges?

A

\dp

358
Q

In psql, which meta command displays the SQL text for a function?

A

Two approaches, sort of:

  1. \df+ myfunc (primitive)
  2. \ef myfunc (opens your EDITOR to allow you to view and edit source code of the function)
359
Q

In psql, how do you view the current working directory?

A

! pwd

360
Q

When trying to connect to a server, what does the message “Could not connect to Server: Connection refused” mean?

A

It means either that no PostgreSQL server is running on the specified host, or that it is not listening on the specified host and port.

361
Q

What are the three bogus levels of security according to EDB?

A
  • Server/application
  • Database
  • Object
362
Q

According to EDB, what step(s) does the Server level of security consist of?

A

Checking pg_hba.conf: there must be an entry matching type (e.g. host, hostssl), database name, user name, client IP address, and authentication method (md5, trust, etc).

363
Q

According to EDB, what does the “Database” level of security consist of?

A
  • Checking user/password combo
  • CONNECT privilege on the database
  • SCHEMA permissions
364
Q

According to EDB, what does the “Object” level of security consist of?

A
  • Object (e.g. table) level privileges, administered with GRANT and REVOKE
365
Q

What are the data fields in each line of pg_hba.conf?

A
  • Type (e.g. host, hostssl)
  • Database name (or ‘all’)
  • User name (or ‘all’)
  • Host spec, incl IPv4, IPv6, or DNS hostname
  • Client IP address modified by CIDR mask
  • authentication method (trust, reject, md5, password, gss, sspi, krb5, ident, peer, pam, ldap, radius or cert.)
366
Q

Name as many of the 13 authentication methods as you can

A

trust, reject, md5, password, gss, sspi, krb5, ident, peer, pam, ldap, radius or cert.

367
Q

In pg_hba.conf, what is a ‘special’ database name value that must be used for, e.g. pg_basebackup connections

A

replication

368
Q

How do you give access to a user someuser connecting to somedb from a specific IP address a.b.c.d, using md5. Be as restrictive as possible. Don’t forget any of the steps required to allow the client to connect.

A
  • Add a line to pg_hba.conf (usually in the cluster’s data directory, unless you have specified that it go elsewhere), like:
    host somedb someuser a.b.c.d/32 md5
  • Reload the server to activate the change to pg_hba.conf - send a HUP signal to the postmaster, or execute the SQL “select pg_reload_conf();” as a superuser, or use “pg_ctl -D datadir reload” (or the OS service equivalent that wraps pg_ctl).
369
Q

How would you set up default privileges so that whenever ddl_user creates tables in schema public, readonly_user is granted the ability to select on the tables.

A

ALTER DEFAULT PRIVILEGES FOR ddl_user IN SCHEMA public GRANT SELECT ON TABLES
TO readonly_user;

370
Q

In order to drop a user that owns objects, what do you need to do first?

A

DROP OWNED or REASSIGN OWNED

371
Q

What privileges are required to access objects in a particular database?

A
  • ) pg_hba.conf - ability to authenticate
  • ) CONNECT privilege on the database (this is default)
  • ) USAGE privilege on the relevant schema
  • ) SELECT, UPDATE, INSERT, DELETE, EXECUTE, etc privilege on the object in the schema (grant some_privs on all tables in schema foo to some_user)
372
Q

What is the type of an 8-byte integer, and what is an alias?

A

bigint (int8)

373
Q

What is the type of a 4-byte auto-incrementing integer?

A

serial

374
Q

What is the type of an 8-byte auto-incrementing integer?

A

bigserial

375
Q

What is the type of binary data (“byte array”)

A

bytea

376
Q

What is the type of a 4-byte floating point number?

A

real (float4)

377
Q

What is the type of an 8-byte floating point number?

A

double precision (float8)

378
Q

What is the specific type intended to be used for a currency amount?

A

money

379
Q

What is an exact numeric of selectable precision?

A

numeric(p,s) (p=precision, the total number of digits, and s=scale, the number of digits in the fractional part)

380
Q

Does PostgreSQL support JSON?

A

Yes, it has a ‘json’ data type for storing JSON (JavaScript Object Notation) data

381
Q

What range types does PostgreSQL support?

A

int4range, int8range, numrange, tsrange, tstzrange, daterange (note: no floating point range)

382
Q

When specifying a column in a CREATE TABLE statement, which comes first, the column name, or the type?

A

The column name comes first, before the column type.

383
Q

Give an example of an insert statement

A

insert into departments(dep_id, name) values (1, ‘Finance’);

384
Q

How can you insert multiple rows with a single INSERT statement?

A

insert into departments(dep_id, name) values (1, ‘Finance’), (2, ‘Silly Walks’);

385
Q

Give an example of an update statement

A

update departments set name=’development’ where dep_id = 1;

386
Q

Give an example of a delete statement

A

delete from departments where department_id = 2

387
Q

How and why would you use dollar quoting?

A

$$Don’t want no stinkin’ single quotes$$
– or:
$foo$Don’t want it$foo$

388
Q

What are double quotes used for?

A

Double quotes are used to delimit database object names that clash with keywords, contain mixed case, or contain special characters (something other than a-z, 0-9, or underscore).

389
Q

What is the SQL concatenation operator?

A

||

390
Q

Name five types of constraints

A
check constraints
not-null constraints
unique constraints
primary keys
foreign keys
391
Q

Give the syntax for CREATE SEQUENCE

A

CREATE [TEMPORARY | TEMP] SEQUENCE name [INCREMENT [ BY ] increment]
[ MINVALUE minvalue | NO MINVALUE ] [ MAXVALUE maxvalue | NO MAXVALUE ]
[ START [ WITH ] start ] [ CACHE cache ] [ [ NO ] CYCLE ]

392
Q

How do you get the next value from the sequence named myseq?

A

nextval(‘myseq’)

393
Q
What is the sequence value returned by nextval after these commands:
CREATE SEQUENCE serial START 101;
SELECT nextval('serial');
A

101

394
Q

What does the nextval function do, and describe its usage

A

Advances the sequence and returns a new value. The single argument should be the name of the sequence, as a string.

395
Q

What does the currval function do?

A

currval returns the most recently used value for a specific sequence.

396
Q

What is the result of the following two commands:
CREATE SEQUENCE serial START 101;
SELECT currval(‘serial’);

A

An error; currval is the most recently used/allocated value for the sequence, so it is undefined until nextval() has been called.

397
Q

What does the setval function do?

A

Sets the next value to be returned by the sequence.

398
Q

Does PostgreSQL have updatable views?

A

Not by default; you have to create rules in order to get updatable views.

399
Q

If managers are only supposed to be able to see employee data for their own department, i.e. does it provide row-level security, to define a view that is department specific like this: create view sales_emp as select * from emp where deptno = 30 and only give managers from deptno 30 the ability to run this view?

A

No, it is not secure to do so, unless the view is create with the “with (security_barrier)” option.

400
Q

Does the optimizer treat a view as a full-fledged query, or a subquery?

A

Subquery. Don’t ask me what this means, though.

401
Q

What are materialized views?

A

PG 9.3 has materialized views, which are like pre-computed views, with the option to refresh the snapshot of data stored in the materialized view. You could also view materialized views as being like tables that can only be populated with a single query.

402
Q

How do you create a materialized view

A

CREATE MATERIALIZED VIEW myview AS SELECT blah, blah blah;

403
Q

How to you update the data in a materialized view?

A

REFRESH MATERIALIZED VIEW myview;

404
Q

What is a lateral subquery?

A

Subqueries appearing in FROM can be preceded by the key word LATERAL. This allows them to reference columns provided by preceding FROM items. (Without LATERAL, each subquery is evaluated independently and so cannot cross-reference any other FROM item.) LATERAL is primarily useful when the cross-referenced column is necessary for computing the row(s) to be joined. A common application is providing an argument value for a set-returning function.

405
Q

What index types are supported by PostgreSQL?

A
  • B-tree (default)
  • Hash (not crash safe)
  • Index on expressions (use when quick retrieval is needed on a frequently used expression)
  • Partial index (index only rows that satisfy the WHERE clause, which need not include the indexed column; a query must use the same WHERE clause in order to use the partial index)
  • SP-GiST indexes (space-partitioned GiST supported partitioned search trees)
406
Q

What is the syntax for CREATE INDEX?

A

CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table_name [ USING method ]
( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC |
DESC ] [ NULLS { FIRST | LAST } ] [, …] )
[ WITH ( storage_parameter = value [, … ] ) ]
[ TABLESPACE tablespace_name ]
[ WHERE predicate ]

407
Q

Is it necessary to wrap the “ON “ clause of a JOIN query in parentheses?

A

No. (However, it is necessary to wrap the column argument of “USING “ in parentheses in a JOIN query that uses one).

408
Q

What does “TABLE “ do?

A

It is equivalent to “SELECT * FROM “

409
Q

Are OIDs dumped by default by pg_dump? Comment on any relevant command line switches.

A

No, OIDs are not dumped by default. Use the -o (little o) switch to dump OIDs

410
Q

What does the -j switch to pg_dump do, and what other command line options must be specified in order to make use of it?

A

You must use the -Fd (directory format) switch along with -j. E.g.: pg_dump -Fd -j 4 would dump to directory format using 4 parallel jobs.

411
Q

If you do a no-frills dump of a database, e.g. pg_dump dbname > dbname.sql, what is the easiest, most basic way to restore it?

A

It should be restored into an empty database. If you drop the database dbname and recreate it, then you can run:
psql dbname < dbname.sql

412
Q

Are all backups made by pg_dump, including non-text backups, portable across architectures?

A

Yes, all backups made by pg_dump are portable across architectures.

413
Q

List as many of the access privilege codes (as seen in psql \dp output, for instance) as you can

A
r -- SELECT ("read")
            w -- UPDATE ("write")
            a -- INSERT ("append")
            d -- DELETE
            D -- TRUNCATE
            x -- REFERENCES
            t -- TRIGGER
            X -- EXECUTE
            U -- USAGE
            C -- CREATE
            c -- CONNECT
            T -- TEMPORARY
414
Q

How do you know when a (point-in-time) recovery is done?

A

recovery.conf in the data directory will have been renamed to recovery.done

415
Q

If you just want to recover up to the most recent WAL file during a recovery, what do you need to put in recovery.conf?

A

Just:

restore_command = ‘cp /mnt/server/archivedir/%f “%p”’ or what have you

416
Q

If you want to recover to a specific timestamp, what do you need to put in recovery.conf?

A
recover_command = 'cp /mnt/server/archivedir/%f "%p"'
recover_target_time = '