databases Flashcards

1
Q

top down database design

A

create ER diagrams (graphical description): start with a set of system requirements and identify entities and attributes then construct relational data model i.e. tables for entities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

bottom up database design

A

start with initial tables for entities and their attributes –> redesign in a “better” way: trickier for larger databases so need a formalization of process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

purpose of normalization

A
  • relations represent real world entities
  • single valued columns
  • avoid redundancy
  • easier to update and maintain correctly without anomalies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

no redundancy

A

each data item is stored only once: -minimises space required, - simplifies maintenance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

consequence of redundancy

A
  • takes up unnecessary space
  • when modifying data, have to modify in different places
  • risk of inconsistencies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

exception for redundancy

A

foreign keys: act as pointers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what causes redundancy

A
  • set valued attributes –> multiple rows in corresponding table. e.g. more than one attribute for a particular entry in the table
  • dependency e.g. postcode and town are dependent on each other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

define redundancy

A

repeating data in multiple different locations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

modification anomaly

A

failure to maintain all existing instances of a specific value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

deletion anomaly

A

losing other values as a side effect of deleting data. e.g. deleting staff member and deleting info of their workplace simultaneously if unnormalised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

insertion anomaly

A

when adding more data items, much more irrelevant data needs to be added. adding rows forces us to add info about other entities –> can lead to inconsistency
e.g. adding details of new surgery, with no staff, add null to staff fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

overcoming redundancy

A

schema refinement (decomposition): use 2+ relations to store the original data. can be done manually for smaller tables but formalisation needed for larger DB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

functional dependencies specify…

A
  • specify which are candidate, primary and foreign keys

- specify which attributes to combine in the new tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

simple key

A

key consists of only one attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

composite key

A

key consists of several attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

candidate key

A

minimal set of attributes whose values uniquely identify tuples.
functionally determines all attributes in a relation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

primary key

A

the candidate key selected to identify rows uniquely within a table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is functional dependency

A

describes relationship between two attributes in the same relation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

if B is functionally dependent on A

A

Represent as A -> B
If we know attribute values for A, then we know the unique attribute values of B.
Each value of A is associated with exactly one value of B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

determinant

A

set of attributes on the left hand side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

dependent

A

set of attributes on the right hand side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

full functional dependency

A

B is not functionally dependent on any proper subset of A, i.e. B is functionally dependent on ALL the primary key if composite key, not just individual parts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

partial functional dependency

A

B remains functionally dependent on at least one proper subset of A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

transitive functional dependency

A

if A->B and B->C functional dependencies exist, then the functional dependency A -> C also exists = transitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

how to determine functional dependencies

A

either -obvious/common sense

or -require discussion/specification from customers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

closure, F+ (+ is raised), of a set of functional dependencies, F

A

set of all functional dependencies implied by dependencies in F. inference rules required to compute this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Armstrong’s axioms (complete and sound)

A
  1. reflexivity (if B is a subset of A, then A -> B)
  2. augmentation (A -> B then A, C -> B, C)
  3. transitivity (if A ->B and B-> C then A->C)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

what does it mean by inference rules being complete

A

given a set X functional dependencies, all dependencies implied by X can be derived from X using these rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

what does it mean by inference rules being sound

A

no additional functional dependencies (not implied by X) can be derived from these rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

further rules derived from armstrong’s axioms

A
  • decomposition: if A -> B, C then A -> B and A -> C
  • union: if A -> B and A -> C then A -> B, C
  • composition: if A -> B and C -> D then A, C -> B, D
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

proof of union rule using armstrong’s axioms

A

A-> B: augmentation with C: A, C -> B, C
A -> C: augmentation with A: A, A -> A, C ( = A -> A,C)
transitivity from above 2 dependencies: A -> B, C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

pseudocode for computing the closure F+

A

F+ = F (initialisation)
repeat until F+ doesn’t change any more:
apply rules of reflexivity and augmentation to each functional dependency, f in set F+, and add these new func. dep. to F+.
for each f1, f2: if imply a func dep f3 using transitivity then add to F+

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

F = set of transitive dependencies
A = set of attributes in a relation
what is A+

A

set of all attributes that can be implied by the attributes of A using functional dependencies from F

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

how to compute A+

A

similarly to F+, start with functional dependencies of F, with attributes A on the LHS and repeatedly apply Armstrongs axioms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

candidate key in terms of relational dependencies

A

minimal set of attributes such that A+ (under F+) includes/ which functionally determine all attributes in a relation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

1NF

A
  • no repeating groups i.e. an attribute that occurs with multiple values for a single occurrence of the primary key
  • no identical rows
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

why not repeat columns horizontally to get into 1NF

A
  • waste of space, not all column spaces used
  • harder to query and reference specific columns
  • need a fixed upper limit on number of repetitions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

why not one long string with all attributes for 1NF

A

-harder to query

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

1NF: one table solution

A

-repeat data for each attribute so each belongs to one entry, but redundant data (i.e. fill in blanks/flatten the table)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

1NF: two table solution

A

-place repeating data in a separate relation, with original primary key as a foreign key. iterate until no repeated groups remain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

2NF

A
  • 1NF

- no partial functional dependencies: every non-key attribute is dependent on the whole of the composite primary key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How to place in 2NF

A

-remove partially dependent attributes and place in separate relation with the copy of their determinant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

3NF

A
  • 2NF
  • no transitive functional dependencies: no non-key attribute is transitively dependent on the primary key: all attributes are dependent on the key, the whole key and nothing but the key
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

How to place in 3NF

A

-remove transitively dependent attributes and place in new relation. take the attributes of their determinant as the primary key in the new table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

how to find primary key

A
  • intuitive observations from data

- computing functional dependency closure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

how to depict functional dependencies

A

-dependency diagram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Limitations of file based approach

A
  • Data duplication in different programs: different values/formats/waste of space
  • Incompatibility: programs written in different languages- difficult to access others’ files
  • Data dependencies: structure of the file is defined in the code of the program
  • Separation and isolation of data: programs maintain their own data and users may be unaware of useful data in other programs
  • Limited queries: each program has a defined purpose, for different queries, more application programs have to be designed
  • Hard to recover when crashes occur
  • Inefficient searching large data sets
  • Many users: controlling simultaneous access = locking+ can’t hide sensitive data = inefficient
  • Not synchronised: inconsistencies+lost updates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

reason for limitations of file based approach

A
  • definition of data is embedded in the application program (instead of stored separately and independently)
  • no central control over the access/manipulation of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

database

A

a large collection of logically related data, designed to meet the needs of an organisation:

  • single repository of data
  • minimum duplication
  • large databases have a data dictionary (metadata repository i.e. data describing the data)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

DBMS

A

a software system that enables the user to define/create/maintain/control access to the DB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

basic features of DBMS

A
  • DDL (data definition language): defines the database- specify the structure, data types + constraints
  • DML (data manipulation language): allows users to insert/delete/update/retrieve data from the database. Query language is the part of this that retrieves data. unlimited queries (adv.) Efficiency: good DBMS can answer SQL queries quickly
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

DBMS allows controlled access to the DB

A
  • security system
  • concurrency control
  • recovery control
53
Q

DB application program

A

computer program that interacts with the user + DBMS. Sends SQL statements to the DBMS + can be conventional (local) or online applications

54
Q

Components of DBMS environment

A
  • hardware: PC or computer network
  • software: DBMS, application program, OS (network software if necessary)
  • data: used by the organisation. schema = abstract description of the data
  • procedures: documented instructions on how to use/run the system e.g. how to log on/use an application/make backups
  • person: anyone involved in the system
55
Q

people in a DBMS environment

A
  • DB designers
  • DB admins: maintenance of DBMS+OS, security+integrity control, satisfactory performance of user applications
  • application developers
  • end user
56
Q

DB designers

A
  • Logical designer: what to store- entities, attributes, relationships+constraints
  • Physical designer: maps logical designs to tables+integrity constraints, selects access methods+storage structures
57
Q

advantages of databases

A
  • shared data
  • concurrency control
  • better data backup/recovery procedures
  • consistency
  • less redundancy
  • economy of scale
  • easier data access
  • data security+integrity
  • faster development of new applications
58
Q

disadvantages of databases

A
  • more hardware
  • complexity/size of system
  • high cost of implementation of DBMS
  • slow processing of some applications
  • high impact of failure
59
Q

need to trust a DBMS so have mechanisms to ensure the database is

A

-reliable
-always in a consistent state
especially when hardware/software failures and multiple users access the database simultaneously

60
Q

database recovery

A

process of restoring database to a correct state after a failure

61
Q

concurrency control protocols

A

prevent database accesses interfering with each other

62
Q

transaction

A

an action carried out by a single person/program which reads/updates the database

63
Q

during a transaction vs after a transaction

A

database may not be in a consistent state vs database in a consistent state + valid integrity/referential constraints

64
Q

transaction outcomes

A

committed- completes successfully
rollback- not completed successfully
(performed entirely or not at all)

65
Q

concurrency control

A

process of managing simultaneous operations on the DB without having them interfere with each other
operations may be correct individually but cause inconsistency when executed simultaneously

66
Q

how is DBMS different from multi user OS

A

-an OS allows two people to edit a document at the same time but if both try to write simultaneously -> ones changes get lost

67
Q

issue with DDL

A

too low level to describe data organisation in a simple way for users: so have data model

68
Q

data model

A

a collection of intuitive concepts describing entities, relationships + constraints

69
Q

three characterisations of data

A
  • structured
  • semi structured
  • unstructured
70
Q

-structured data

A

data is represented in a strict format (relational data model)
DBMS checks the data follows: the structures and integrity/referential constraints specified in the schema

71
Q

-semi structured data

A

self describing data, schema info is mixed in with the data values
ad hoc collected data, when not known how it will be stored/managed in advance
may have some structure but not all data has the same structure, each data object has different attributes not known in advance

72
Q

-unstructured data

A

very little indication of the type/structure of data

e. g. text document with some info in it
e. g. web doc with some HTML in it

73
Q

relational data model

A
relations = tables
attributes = columns
tuples = rows
74
Q

issue with relational data model

A

too low level for big companies so express in a non-technical way: E-R model

75
Q

E-R model

A

top down approach to DB design, graphical description
represents entities, attributes and relationships and constraints on these
uses UML notation + crows foot notation

76
Q

3-Level ANSI-SPARC architecture

A

external level: user’s view of data
conceptual level: logical structure of data as seen by DB administrator
internal level: physical representation of data: algorithms, data structures

77
Q

DB schema

A

abstract description of the collection of data in a DB:

defines schema, domain for each attribute and specifies integrity constraints

78
Q

DB instance

A

data at a particular moment

79
Q

objectives of DB schema

A

all users have access at every point to the same DB instance w/ customised views of parts of data

80
Q

data independence

A

upper level of DB schema unaffected by changes to lower levels
e.g. physical data independence: if internal schema changed- conceptual level unchanged, and users only notice a change in performance

81
Q

three main phases of database design

A

conceptual design
logical design
physical design

82
Q

conceptual design

A

acts as fundamental understanding of DB
high level graphical representation via ER diagrams based off user requirements specification (completely independent of physical considerations)

83
Q

logical design

A

create relational data model using conceptual design then normalise to eliminate redundancy/anomalies

84
Q

physical design

A

describe database implementation of logical design:

  • specific storage structures/access methods/security protection
  • aim is optimum performance
85
Q

classification of design phases into 3-level ANSI-SPARC

A

external schema + conceptual schema =logical/conceptual database design
internal schema+physical storage =physical DB design

86
Q

relational database

A

collection of normalised relations

87
Q

a relation is ‘normalised’ if

A

it is appropriately structured i.e. one value per cell and no repeating rows

88
Q

domain

A

set of allowable values for an attribute

89
Q

degree of a relation

A

number of attributes

90
Q

cardinality of a relation

A

number of tuples

91
Q

tuple

A

row of a relation with concrete values for each of the attributes

92
Q

attribute

A

named column of a relation: each with a unique name- properties/common characteristics shared by all entity instances of an entity type

93
Q

relation

A

table with rows and columns

94
Q

how does this differ from mathematical relations

A

ordering of attributes doesn’t matter

95
Q

foreign key

A

attribute in one table A whose values must match the primary key of another table B (or be null) then A references B

96
Q

entity integrity

A

every attribute of a primary key can’t be null:

ensures every entity has a unique identifier and foreign key values can reference primary key values

97
Q

referential integrity

A

ensures a foreign key matches the primary key from another relation or are null:
ensures references between tables are valid and prevents deleting a row in a table if there is a foreign key in another relation referencing it

98
Q

types of relations

A

base relations: physically stored in the database
view (virtual relation): not physically stored in the database but derived from content of base relations. used to show customised info to every user and calculate dynamic quantities. computed upon request from user and changes when base relations change.

99
Q

alternatives to relational data model

A
  • network data model: tuples modelled as nodes and relationships (foreign keys) as edges
  • hierarchal data model: type of NDM but graph is a tree graph and structure models ‘parent-child’ relationship
    (limitations: deleting parent or adding record without a parent)
100
Q

objective of ER model

A

ER model (graphical representation of DB):
understand nature and relationships among data
help derive tables in RDM

101
Q

entity

A

data objects: real thing (or abstract notion) that we recognise as a separate concern within the database

102
Q

entity type

A

group of all objects with the same properties, identified as having an independent existence

103
Q

entity occurence

A

a uniquely identifiable instance of an entity type

104
Q

relationship

A

named association between two entity types which has some meaning in the context of the database

105
Q

cardinality of a relationship

A

the number of entity occurrences that are related to a single occurrence of an associated entity type through this relationship

106
Q

entity participates ‘optionally’ (indicated by O on other side of relationship line)

A

it has partial participation otherwise has total participation (indicated by line on other side of relationship line)

107
Q

simple attribute

A

consists of a single component- can’t be broken down further

108
Q

composite attribute

A

multiple components: can be subdivided into smaller components

109
Q

single valued attribute
multi valued attribute
derived attribute

A
  • only holds one value
  • it can hold many values
  • it can be derived from the value of a related attribute
110
Q

literals
{a}, [a], (…)
|

A

constants used in SQL statements
{a} = required element [a] = optional element
… = optional repetition
| = choice

111
Q

SQL is ‘free format’

A

free format = parts of statements don’t need to be written on specific parts of the screen

112
Q

types of query languages

A

SQL: formal definition of a new relation (which data is to be retrieved) from existing DB relations
relational algebra: how to build a new relation from existing DB relations

113
Q

‘WHERE’
‘SELECT DISTINCT’
testing for null

A

filters rows according to some condition
eliminates duplicates
WHERE comment IS NULL

114
Q

% vs _

A
% = wildcard, represents an arbitrary sequence of 0 or more characters
_ = represents an arbitrary single character
115
Q

aggregate function

A

operates on a single column and returns a single numerical value

116
Q

GROUP BY

A

partitions data into groups, producing single summary row

117
Q

types of subquery

A
  • single-value/scalar subquery: single column and simple row
  • multiple-value subquery: single column and multiple rows
  • table subquery: multiple rows and columns
118
Q

nested queries

A
  • two scalar queries ‘WHERE branchNo = (select…)

- scalar + mulivalued subquery ‘WHERE branchNo IN (select…)

119
Q

multivalued subquery

A

-ANY or SOME before subquery so WHERE is true if satisfied by at least one value returned by subquery and ALL if satisfied by all values returned by subquery
‘WHERE salary > SOME (select …)

120
Q

alias for a column

A

in the FROM clause leave a space and write alias for column

121
Q

inner join vs natural inner join

A

natural inner join demands equality for columns with the same name in the two tables e.g. a.salary = b.salary
but inner join without ‘natural’ can demand different relations between column values

122
Q

inner join vs outer join

A

inner join: omits all rows that don’t satisfy join conditions
outer join: retains (some of) the rows that don’t satisfy the join conditions
(left outer join: retains rows of the left table that are unmatched with rows from right table.
full outer join: retains unmatched rows from both tables (fills necessary fields with NULL)

123
Q

database updates

A
INSERT INTO TableName [(columnList)]
VALUES (data ValueList)
----
UPDATE TableName
SET columnName1 = dataValue1 [,columnName2= data Value2..]
[WHERE searchCondition]
-----
DELETE FROM TableName
[WHERE searchCondition]
124
Q

DDL basic commands

A

CREATE table: assign name to table and defines names and domains to each column in table
ALTER table: amends relation schema/table structure bc of design error or design change
DROP table: deletes a table
Specify integrity + referential constraints: -PRIMARY KEY -FOREIGN KEY
Create view- creates virtual table derived from base relation according to query to provide user with a virtual relation

125
Q

BIT(n)
INTEGER
SMALLINT
NUMERIC(p,d)

A

BIT(n)- bit string of length n
INTEGER- large +ve or -ve integer
SMALLINT- small +ve or -ve integer
NUMERIC(p,d)- +ve/-ve decimal number with at most p precision (num digits) and d scale (num digits after .)

126
Q

define custom domain types + with constraints

A

CREATE DOMAIN Name AS VARCHAR(10);
CREATE DOMAIN SEX AS CHAR(1)
CHECK (VALUE IN (‘M’, ‘F’))
or replace bracket (‘M’, ‘F’) with a SELECT statement

127
Q

view

A

relation that depends on other relations and isn’t physically stored as a table

128
Q

query language is relationally complete if

A

it can be used to produce any relation that can be created from relation calculus/algebra expressions

129
Q

teleprocessing architecture

A

one computer with single CPU:
many end terminals cabled to central computer, all processing done in central computer: recieves requests, processes and responds with information
downsizing to more cost effective PC network with better/same performance: dec performance bc burden on central computer