Databases and distributed systems Flashcards

paper 2

1
Q

what is a relational database?

A

multiple tables that contain related data that is linked using relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

relational database pros
(name 3)

A

less data redundancy (repeated data less common)
data consistency (fewer contradictory entries)
better security (can restrict viewing access)
better flexibility (easy to add new table)
data independence (data can be used for multiple things without customisation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

relational database cons

A

greater complexity
can take longer to retrieve data if it is across multiple tables
can be overkill for simple sets of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

primary key

A

unique identifier in a table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

foreign key

A

primary key from one table within another to link tables together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

data consistency

A

ensure that the constraints or limitations placed on the database area adhered to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

data reduncancy

A

duplication of data, data held in multiple locations
harder to maintain accuracy if the data changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

data integrity

A

correctness of the data over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

validation checks (name 4)

A

type check (correct data type)
length check
range check
format check (conforms to set of rules)
presence check
check digit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

verification check

A

double entry (enter data twice to ensure accuracy)
proof reading (read over input to manually check)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

field

A

one category of data in a table (a column)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

relationship

A

a link between two tables
can have one-to-many
can’t have one-to-one (except passwords) or many-to-many

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

normalisation

A

the process of converting a flat file database into a relational database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

1NF

A

each field only contains one piece of data
all attributes dependent on the primary key
no data redundancies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

2NF

A

1NF + no partial dependencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

3NF

A

2NF + no transitive dependencies

17
Q

what is a database management system (DBMS)?

A

piece of software that sits between user and the data
allows the user to manipulate the data and different users’ access to it

18
Q

name 4 DBMS functions

A

checks data for inconsistencies
provides security (access rights)
backing up data
performs calculations on data retrieved from DB
management of data dictionaries

19
Q

big data

A

set of data that is too large to be processed using traditional methods
can be classed as big data when:
-volume of data such that it can’t fit on single server
- rate at which new data arrives is very high
- data of different types to point it can’t be put into table/series of tables

20
Q

methods of working with big data

A

data warehousing
data mining
predictive analysis

21
Q

data warehousing

A

storing large, complex and not directly related data items in a way to make it readily retrievable and understandable

22
Q

data mining

A

process of retrieving data from large data sets with a view to identifying patterns

23
Q

predictive analysis

A

using techniques performed on big data to predict what might happen in the future

24
Q

distributed system

A

system in which both processing and storage functions are spread across multiple geographical locations

25
Q

distributed system pros (name 3)

A
  • performance more consistent as no system-wide bottlenecks
  • adding new node to system is easy as system already made of nodes
  • errors more likely to be contained to a node
  • even if one node fails the bulk of the system still works
26
Q

distributed system cons (name 3)

A

-without proper planning data transmission can be inefficient which will waste bandwidth
- more difficult to prevent data duplication
- security can be harder to enforce in every node + multiple entry points
- more complex to set up and maintain