no sql Flashcards
what are the 4 types of data
structured and unstructured
dynamic and static
dynamic
changing frequently
static
never changes
strcutured
formal predefined
easy to store and process
unstructured
e.g. audio, image, music
usually still has internal structural properties
sharding data
splitting the data to allow concurrent/parallel access using multiple machines
can simultaneously access each shard
in which two ways can we scale databases
vertically and horizontally
vertical scaling
upgrading hardware e.g. increasing memory
what is the limitaiton of vertical scaling
limited by the amount of cpu ram disk etc that can be configured on a single machine
horizontal scaling
adding more machines which requires shading and replication so you can work with them simultaneously
what is the limitation of horizontal scaling
read-to-write ratio and communication overhead
in which three ways can we benefit form parallelisation
maximise the fraction of the program that can be parallelised
balance the work load and parallel process
minimise the time spent on communication
how does the two phase commit protocol work
the coordinator requests cote for commit and the participants either approve or reject
if all participants accept then everything gets committed at the same time
what is the issue with two phase commit
hard to find a time where all servers are ready to commit
what is the CAP theorem
any distributed database with shared data can have at most 2/3
usually sacrificing consistency
what are the three components of cap theorem
consistency; every node always sees the same data at the same time
availability; the system continues operating even if nodes crash or software or hardware is down
partition tolerance; the system works well when distributed
what are the BASE properties
basically available; the system guaranteed availability
soft state; system state may change over time
eventual consistency; will eventually become consistent
what does it mean for a db to be eventually consistent
if all replicas will gradually become consistent in the absence of updates
what makes no sql no sql
no strict schema requirements
no strict adherence to acid properties
consistency is traded in favour of availability
document database/store
loosely structured set of key value pairs in documents encapsulate and encode data in some standard formats/encodings
treated as a whole
query languages can help retrieve documents based on their contents
addressed in the db via the unique key
in mongo what is the primary key
key; “_id”
sorted ordered column-oriented stores
columns are grouped in column families which data is stored in rather than tables
each unit of data is a set of key value pairs identified by row-key
graph db
everything is stored as an edge node or attribute
each node and edge can have any number attributes and can be labelled which narrows searches
what do document db use instead of an fk
embedded documents and referencing
what can be a value in document db
any data type
references
including links from one document in another which normalises the db
what are some benefits of using referencing
can represent more complex many-to-many relationships
good for large hierarchical datasets
what is a negative of using referencing
requires follow up queries to find all the data you need
embedded data
having a doc inside another via an array
embedded data positive
can get all the data in one call using less queries
negative of embedded data
the db isn’t normalised an not all values are atomic
data model
displays a set of tables and the relationship between them providing a blueprint so you can identify which data is important and what should be maintained
which two parts of the CAP approach does mongodb focus on
consistency and partition tolerance
what are the different parts of the mongodb structure and how do they relate to eachother
an instance has 0/more databases
a database has 0/more collections
a collection has 0/more documents
a document has 1/more fields/attributes
what is mongosh
an interactive shell that is a fully functional javascrips interpreter
in mongo what happens when you USE a db but it doesnt exist
mongo creates it
db.dropDatabase
deleted the db
show dbs
shows your databases
what does db.collection.findOne({“title”: /c/}) do
finds all the attributes with c in the name (regular expressions)
mongod
db instance
CRUD
create
read
update
delete
db.collection.insertOne()
inserts a single document into the collection and if the collection doesnt exist then mongo creates it
what does the backtick do``
evaluates the contents within it
db.collection.find
prints all docs in the collection
db.collection.find(<query>)</query>
prints all docs in the collection that match the query
db.collection.find(<query>).count</query>
prints the number of docs in the collection that match the query
db.collection.findOne()
prints the first doc that matches the query
what happens if the projection and query are both null (db.collection.findOne(<query>,<projection>)</projection></query>
prints out all the fields in the first document
db.collection.updateOne((<query,<update>,<options>)</options></update>
updates the command matching the query
db.collection.updateMany((<query,<update>,<options>)</options></update>
updates all documents matching the query
in which two ways do we enforce constraints in mongodb
validators and indexes
validator
ensure fields meet specific criteria
e.g. required, not null, datatype
indexes
can enforce uniqueness and speeds up searching by organising data and provide multiplicity constraints
e.g. unique
index 1
ascending
index -1
descending
how do we use indexes to create m:n relationships
creating a compound index for the join table and make them unique
validator + index
exactly 1
using index means
at most one
using validator means
at least one
what is the main difference between validators and indexes when being created
validators are created while creating the collection whilst indexes can be created after
what is the key symbol for a query
$
$gt
greater than
$eq
equal
$and
AND
$or
OR
$regex
regular expression
what does {} signify in a findOne()
the condition
how can we ensure total participation using a validator
required: [x,y]
,
implicit AND
what does a regex start and end with
/ and $/
^ meaning in regex
start of the string
. meaning in regex
any character
{5,} meaning in regex
at least 5
i meaning in regex
case insensitive
?! meaning in regex
negative lookahead; checks if the pattern does not follow the current position
?= meaning in regex
positive lookahead; checks if the pattern follows the current position
- meaning in regex
order of string not important
what is the cursor in mongodb
a pointer that keeps the value returned by find()
how do you retrieve documents as a constant value from the cursor
using loops
.sort({item: 1/-1})
sorts the returned items in ascending or descending order
how do we show which attributes we want to be included in the result of find()
at the v end of the query
name: 1
how do we omit the _id from our find() result
_id: 0
can only be used with _id as its automatically given
how do we use IN in mongodb
type: {$in:[‘food’]}
how do we search in an embedded document
need to fill all subdocuments fields and in the same order when making your query
how do we get the first value in an array when writing a query
e.g. ‘ratings.0’
what does $elemMatch do
returns if at least one array element satisfies the entire condition
why do we need to use $elemMatch
if we dont and we have multiple conditions then mongo just does an implicit or
what is the mongo equivalent of group by having
$group $match
aggregation pipeline
documents go through a pipeline of operations until aggregated
how do you use count to find the number of documents in the collection
$count : “name”
how do you count a specific attribute in a document in a collection
$match before counting
db.createView(viewName, collection, pipeline, options)
temporarily returns a collection that will not be kept in the database
$lookup
creates a cross product of two collections based on a common field
$unwind
deconstructs/ flattens an array into individual docs