Graph Databases, Column Stores, Key Value Stores and Document Databases Flashcards
relational databases
primary keys prevent duplication, foreign key enforce referential integrity and the normal form rules prevent anomalies therefore data is always consistent.
however may be performance problems if the database is very big and may be wasted space if the data is very varied because several columns will have NULL values.
graph databases
view data as a mathematical graph, relationships between the data is the key, data doesnt always have to have the same values and is often used in social networking
used in data science as well and can really focus on relationships, often used as parts of larger systems to show the relationships. doesnt need joins so can scale better but can be issues with scaling as the graph gets bigger
column stores
have tables like relational databases however the data is stored in columns and not rows,
some column stores use sql in the traditional manner to permit querying, some allow programmatic access to data, not really NoSQL databases although some vendors, e.g amazon view them as such,
performance of searching for a specific column is much faster, if there are queries that want values in a particular column then this is very quick to retrieve , don’t need to pass over data from other columns. Used in OLAP and data warehousing where analysis is the most common operation and data doesn’t change very often. however performance of searching for a specific row is much slower - would need to extract each value for each column. likewise, the performance of adding a new row is not as good because a new value would have to be added to each of the columns, rather than a single addition in a relational database
hashing
key-value stores are based on hashing, the idea is that a “key” which can be quite simple is mapped to some other value which can be more complex. a hash function is applied to the value to determine a storage location for it. storage data structure is known as a hash table.
can think of a hash table as being like an array, therefore O(1) access to the elements but a poor hash function can lead to key collisions
key-value stores
also known as a dictionary rather than a hash table. the idea is that a simple key maps to a more complex value. is like an array of keys. the values are the records in the table. the records can have different fields to each other unlike a relational DB
the keys (key-value stores)
keys need to be unique to prevent the problem of key collisions, some databases allow users to define arbitrary keys whereas other set limits. keys should not be too long for performance reasons
the values(key-value stores)
the values can be anything - different databases permit different data types. some databases don’t enforce data types . can be basic values such as numbers or text however, can also be more complex such as sets, lists, code, images or even key-value pairs again encapsulated in an object
advantages of key-value
performance; recall that hashing is o(1), no need to ensure consistency, i.e. referential integrity, querying is simple lookup can be very quick, can have potentially very large numbers of records - if you know the key then you can get the value very quickly no matter how many records
data variation ; recall that all the values can be different to each other, if the data is highly variable then a relational DB would have a lot of NULL values
disadvantages of key-value
lack of referential integrity; can have data in different values that contradicts each other, no equivalent of foreign keys if data changes
basic querying only; querying is simply looking up values using the keys and returning the values, three operations - PUT, GET and DELETE, cannot do the more complex queries that can be done with SQL in relational databases
document databases
document databases use the document-oriented model to store data, documents are self-contained, each document might have different numbers of fields and different fields to other documents, document databases are similar to key-values stores in that there are keys - usually a text string which identifies a document. however in key-values stores a GET returns the entire field whereas in a document can be queried to return certain parts of the document - can also do more specific querying whereas key-values gets the whole thing.
the documents might be in a variety of formats, e.g. JSON, XML or even the likes of the PDF - is semi-structured data, documents are similar to objects in object-oriented programming, basic operations are permitted - create, read, update, delete (CRUD), document databases are NoSQL, can be accessed via programming languages
good if the data is varied - the documents can all contain different fields, key-value storage approach offers quick performance - fast to load and store documents (as no need to join tables). however basic operations only , no nomalisation at al, the same informatioin can be duplicated in different documents which wastes storage space, inconsistancies can arise