MongoDB Flashcards
Describe the Documental Database paradigm and it’s features.
Documental Databases can be strutured as colletions containing documents. This documents can be acessed with a key (key-value structure) or per their proper content.
The Pros of Documental Databases can be refered as the ease to store semi-structured data that require intensive use of null values, know to be programmer friendly (simple syntax and implementation close to the conceptual model) and being web-oriented (using CRUD operations: PUT, GET, POST, DELETE).
Examples of Documental Databases are CouchDB, MongoDB, Firebase, ElasticSearch.
Enumerate the best use cases for the use of MongoDB or Documental Databases in general.
- Content managment
- IOT
- Overall, solid paradigm to start (if there are existing doubts about the possbile paradigm to choose)
!! Graphs (especially with need to relate and join data) == NOT GOOD!
Define architecture concepts like mongod and Multi-node architecture, relating the goal of existing an Arbitrer.
Mongod can be defined as an instantiable process from mongoDB to manage data acess, running in every node.
Multi-node architecture can be defined as a group of mongod instances that store, in the beginning, the same copies of the data.
Multi-node architecture:
- Primary: stores the main copies
- Secondary: stores the secondary copies of the Primary
- Arbitrer: If the Primary goes down, votes to decide which Secondary will become the Primary.
Describe the process related with Sharding and all the concepts envolved.
Sharding consists on an optional distribution node data distribution that MongoDB allows. The documents itself are not divided (they are a indivisible unity), the dividing is based on a partition key (shard key) and the pieces of the documents (Chunks) are placed in shards (replica set that stores part of the data from the DB).
The distribution of the Chunks though the Shards is based on Shard key and its use (the Shard key always as to be a field of an indexed document).
Compare both types of Sharding (Range Based Sharding and Hash Based Sharding)
> Range Based Sharding = chunks are created based on the value of the shard key
- Documents with similar shard key value will probably be in the same shard (good for viewing data)
- Lots of Documents with the same shard key range => overload of data.
> Hash Based Sharding = hash value based on the shard key
- Document distributed “randomly” between the shards
- Good balance
- Slower views (obligated to acess multiple shards)
Explain the difference between including and not including the shard key on a given query.
- If the shard key is included, the search wiil only happen in the shards with the required documents.
- If the shard key is NOT included, there will be a sequencial search on all the shards (inefficient)
Describe the types of nodes related to Sharding
- mongos => routing service between the app and the database
- Config. server => server that stores the metadata to locate the data of the required operations from the user (the type mongos node contact the correct Shards)
Explain process of acessing data by the user.
- The clients (apps) connect to a Router (mongos) from the DB.
- The mongos connects with the Config. Server to determine where are the required data and where to write the new one.
- Though mongos, the app connects to an adecuate shard.
Describe the redundancy of the nodes relating with the process of obtaining data (mongos, config server and shards)
A well made design can be:
3 Config. Server: grant acess if 1 or 2 goes down
- Distribute the work load
- Writtings in this files only happen if the metadata changes (Chunks of a colletion change Shard)
- If all 3 go down, the Shards can be acessed if the mongos are not restarted.
2 or more mongos
- Distribute the work load in the acess by many apps.
2 or more Shards
- DIstribute documents in various nodes from the cluster. Each Shard will have a main mongod and various secondary mongod
Define Heartbeat detection
- Active communication between nodes indication which nodes are still “alive”.
- If the Primary node goes down, secondary nodes need to vote between them.
- If there is a draw, Arbitrer gets called to solve the draw
Define the special cases related to secundary nodes in Replication
- Priority 0 (secondary cannot convert in primary)
- Hidden replica set (invisible to the client app)
- Delayed replica set (store the copies of the Primary “with delay”)
Describe the strategy of storing data used by MongoDB
- Snapshot in memory from the latest data written/modified
- Checkpoint: Each 60 seconds or 2GB, the Snapshot is written in disk, becoming the new durable copie of the data (during the writing of a Checkpoint, the last one still remains valid until the latest one is written completely. In case of some error, the oldest one can be restored).
- Journal : mongoDB writes all its transactions of the Snapshot on a log file(Jorunal) to be able to recover data from the Checkpoints.
What’s the instruction to sort data in descent order, limiting the results to 8 rows?
db.orders.find().sort({amount: -1}).limit(8)
How to declare a cursor?
var myCursor = db.inventory.find()
It returns an iterator. If we want to acess it in Java for example:
while (myCursor.hasNext()) { var document = myCursor.next(); …
Link for the MongoDB commands
https://gist.github.com/bradtraversy/f407d642bdc3b31681bc7e56d95485b6