Databases and distributed systems Flashcards by Unknown Unknown

what is a relational database?

multiple tables that contain related data that is linked using relationships

How well did you know this?

Not at all

Perfectly

relational database pros
(name 3)

less data redundancy (repeated data less common)
data consistency (fewer contradictory entries)
better security (can restrict viewing access)
better flexibility (easy to add new table)
data independence (data can be used for multiple things without customisation)

How well did you know this?

Not at all

Perfectly

relational database cons

greater complexity
can take longer to retrieve data if it is across multiple tables
can be overkill for simple sets of data

How well did you know this?

Not at all

Perfectly

primary key

unique identifier in a table

How well did you know this?

Not at all

Perfectly

foreign key

primary key from one table within another to link tables together

How well did you know this?

Not at all

Perfectly

data consistency

ensure that the constraints or limitations placed on the database area adhered to

How well did you know this?

Not at all

Perfectly

data reduncancy

duplication of data, data held in multiple locations
harder to maintain accuracy if the data changes

How well did you know this?

Not at all

Perfectly

data integrity

correctness of the data over time

How well did you know this?

Not at all

Perfectly

validation checks (name 4)

type check (correct data type)
length check
range check
format check (conforms to set of rules)
presence check
check digit

How well did you know this?

Not at all

Perfectly

verification check

double entry (enter data twice to ensure accuracy)
proof reading (read over input to manually check)

How well did you know this?

Not at all

Perfectly

field

one category of data in a table (a column)

How well did you know this?

Not at all

Perfectly

relationship

a link between two tables
can have one-to-many
can’t have one-to-one (except passwords) or many-to-many

How well did you know this?

Not at all

Perfectly

normalisation

the process of converting a flat file database into a relational database

How well did you know this?

Not at all

Perfectly

1NF

each field only contains one piece of data
all attributes dependent on the primary key
no data redundancies

How well did you know this?

Not at all

Perfectly

2NF

1NF + no partial dependencies

How well did you know this?

Not at all

Perfectly

3NF

Study These Flashcards

2NF + no transitive dependencies

what is a database management system (DBMS)?

Study These Flashcards

piece of software that sits between user and the data
allows the user to manipulate the data and different users’ access to it

name 4 DBMS functions

Study These Flashcards

checks data for inconsistencies
provides security (access rights)
backing up data
performs calculations on data retrieved from DB
management of data dictionaries

big data

Study These Flashcards

set of data that is too large to be processed using traditional methods
can be classed as big data when:
-volume of data such that it can’t fit on single server
- rate at which new data arrives is very high
- data of different types to point it can’t be put into table/series of tables

methods of working with big data

Study These Flashcards

data warehousing
data mining
predictive analysis

data warehousing

Study These Flashcards

storing large, complex and not directly related data items in a way to make it readily retrievable and understandable

data mining

Study These Flashcards

process of retrieving data from large data sets with a view to identifying patterns

predictive analysis

Study These Flashcards

using techniques performed on big data to predict what might happen in the future

distributed system

Study These Flashcards

system in which both processing and storage functions are spread across multiple geographical locations

distributed system pros (name 3)

- performance more consistent as no system-wide bottlenecks - adding new node to system is easy as system already made of nodes - errors more likely to be contained to a node - even if one node fails the bulk of the system still works

distributed system cons (name 3)

-without proper planning data transmission can be inefficient which will waste bandwidth - more difficult to prevent data duplication - security can be harder to enforce in every node + multiple entry points - more complex to set up and maintain

Databases and distributed systems Flashcards

paper 2 (26 cards)