Databases and distributed systems Flashcards
paper 2
what is a relational database?
multiple tables that contain related data that is linked using relationships
relational database pros
(name 3)
less data redundancy (repeated data less common)
data consistency (fewer contradictory entries)
better security (can restrict viewing access)
better flexibility (easy to add new table)
data independence (data can be used for multiple things without customisation)
relational database cons
greater complexity
can take longer to retrieve data if it is across multiple tables
can be overkill for simple sets of data
primary key
unique identifier in a table
foreign key
primary key from one table within another to link tables together
data consistency
ensure that the constraints or limitations placed on the database area adhered to
data reduncancy
duplication of data, data held in multiple locations
harder to maintain accuracy if the data changes
data integrity
correctness of the data over time
validation checks (name 4)
type check (correct data type)
length check
range check
format check (conforms to set of rules)
presence check
check digit
verification check
double entry (enter data twice to ensure accuracy)
proof reading (read over input to manually check)
field
one category of data in a table (a column)
relationship
a link between two tables
can have one-to-many
can’t have one-to-one (except passwords) or many-to-many
normalisation
the process of converting a flat file database into a relational database
1NF
each field only contains one piece of data
all attributes dependent on the primary key
no data redundancies
2NF
1NF + no partial dependencies
3NF
2NF + no transitive dependencies
what is a database management system (DBMS)?
piece of software that sits between user and the data
allows the user to manipulate the data and different users’ access to it
name 4 DBMS functions
checks data for inconsistencies
provides security (access rights)
backing up data
performs calculations on data retrieved from DB
management of data dictionaries
big data
set of data that is too large to be processed using traditional methods
can be classed as big data when:
-volume of data such that it can’t fit on single server
- rate at which new data arrives is very high
- data of different types to point it can’t be put into table/series of tables
methods of working with big data
data warehousing
data mining
predictive analysis
data warehousing
storing large, complex and not directly related data items in a way to make it readily retrievable and understandable
data mining
process of retrieving data from large data sets with a view to identifying patterns
predictive analysis
using techniques performed on big data to predict what might happen in the future
distributed system
system in which both processing and storage functions are spread across multiple geographical locations
distributed system pros (name 3)
- performance more consistent as no system-wide bottlenecks
- adding new node to system is easy as system already made of nodes
- errors more likely to be contained to a node
- even if one node fails the bulk of the system still works
distributed system cons (name 3)
-without proper planning data transmission can be inefficient which will waste bandwidth
- more difficult to prevent data duplication
- security can be harder to enforce in every node + multiple entry points
- more complex to set up and maintain