Databases & Big Data Flashcards
What is an entity?
Data inside of a database to be stored.
What are attributes?
Characteristics of other information about entities.
What is an entity identifier?
Unique attribute given to an entity.
What is an entity description?
Describes how information about data is stored in the table.
What are Relational Databases?
Idea that tables within a database can be related - linked with common attributes.
What is a primary key in databases?
Attribute which provides a unique identifier for every identity within table.
What is a foreign key?
Attribute within table which is the primary key of another table.
What is a composite primary key?
Primary key formed by a combination of attributes.
Why are databases normalised?
Allow efficiency without compromising integrity of data.
Ensures no redundant or repeated data.
What is first normal form?
First stage of normalisation
Database table will contain no repeating attributes
Atomic - No column contains more than one value.
What is second normal form?
Partial keys dependencies removed
- Attributes not dependant on the whole composite key.
What is third normal form?
Database fits second form, and has no non-key dependencies.
What are client-server databases?
Database which allows simultaneous access for multiple clients.
What is concurrent access?
When 2 users attempt to request access to the same fields at the same time.
Result in database updates being lost.
How can concurrent access issues be managed?
Record locks .
Serialisation
Timestamp / commitment ordering.
What are record locks?
Record locked when user has accessed it. Unlocked when user finished using it.
What is serialisation
Requests from other user placed in a queue - When first user finished, the next command in queue is executed.
What is timestamp / commitment ordering?
Timestamp - Commands executed in order of timestamps of sent request.
Commitment - Algorithm used to work out optimum order in which to execute commands - impact of commands etc to minimise issues from occurring.
What is Big Data?
Term used for data which won’t fit in usual containers.
What are the 3 defining features of Big Data?
Volume - Too much data for conventional HDD or servers - Has to be spread over multiple servers.
Velocity - Data on servers modified and created rapidly.
Variety - Data held on servers consist of many different types of data - Binary, multimedia etc.
What is the main problem plaguing Big Data?
Lack of structure - not massively volume of data.
How is machine learning being used in Big Data?
Unstructured nature of Big Data makes it hard to extract useful information - Machine learning used to discern patterns in data.
What is Functional Programming?
Solution to problem of processing data over multiple machines.
How does functional programming work?
Programs are stateless and use immutable data structures.
Supports higher-order function - Using functions as inputs and outputs.
What is the fact-based model for representing data?
Big data doesn’t store well in columns and tables.
Using immutable data removes risk of data being lost from human error, removes need for index and new data simply appended as dataset is created.
How does the fact-based model work to store data?
Information stored as an immutable fact.
Stored with a timestamp - allow computer to use most recent information.
How is Big Data represented using Graph Schema?
Graph Schema - using graphs of nodes and edges to graphically represent structure of dataset - Nodes are entities, edges are relationships.