College 13: Doc Stores Flashcards
MongoDB
A document-oriented NoSQL database designed for handling large volumes of unstructured or semi-structured data. It is a powerful, flexible, and scalable NoSQL database suitable for a wide range of applications, particularly those dealing with large volumes of dynamic and unstructured data. Its schema-less design, efficient data encoding, and support for complex querying and indexing make it a popular choice for modern web and mobile applications.
Advantages of MongoDB
- Schemaless nature: No enforced schema, allowing documents within a collection to have different structures.
- JSON/BSON Storage: Stores data in JSON format, making it easy to read and write. Internally uses BSON (Binary JSON) for efficient data encoding and decoding.
BSON (Binary JSON)
- Lightweight and Traversable:
o Designed for efficient encoding and decoding.
o Supports embedding, reducing the need for joins. - Data Types:
o BSON supports a variety of data types, including strings, numbers, objects, arrays, binary data, and more.
Key features
- _id Field:
o Every document has a unique _id field, serving as the primary key.
o Typically an ObjectId, which is unique and roughly represents creation time. - CRUD Operations:
o Supports Create, Read, Update, Delete (CRUD) operations.
o All writes are atomic at the document level. - Indexing:
o Supports indexing to improve query performance.
o Can index any field within a document.
Embedding
Nested documents within a parent document. Used when “many” objects are often accessed together with their parent.
Linking
References to other documents. Used for more flexibility or when relationships are not as tightly coupled.
One-to-One and Many-to-Many Relationships
Flexible modeling of relationships using embedding or linking based on access patterns.
Advatages over SQL
- Schema Flexibility:
o No predefined schema, allowing for** dynamic and evolving** data models. - De-normalization:
o Embedding data within documents provides** data locality**, improving access speed and reducing the need for joins.
Replication
MongoDB ensures high availability and data redundancy by maintaining multiple copies of data across different servers through replication. Replication involves maintaining a replica set, a group of MongoDB instances with the same data set. The key components are the primary node, which handles all write operations and serves reads by default; secondary nodes, which replicate data from the primary asynchronously and can be promoted to primary if needed; and arbiters, which participate in elections to choose the primary but do not store data. Asynchronous replication means there might be a delay between primary and secondary data. The oplog (operation log) is an ordered sequence of operations performed by the primary that secondaries apply to maintain identical data sets. Benefits include high availability, as secondaries can be promoted to primary if the primary fails, and data redundancy, providing protection against data loss through multiple data copies.
Auto-sharding
Allows MongoDB to **handle large datasets **and high throughput by distributing data across multiple machines.
In MongoDB’s auto-sharding, key components include shards, which are individual instances holding subsets of data and functioning as replica sets for high availability; config servers, which store metadata and configuration settings; and query routers (mongos), which route client queries to the appropriate shards. Sharding works by using a shard key to distribute data into chunks across shards. During the sharding process, data insertion uses the shard key to determine the target shard, queries are routed by mongos to the correct shards, and the balancer ensures chunks are evenly distributed to prevent bottlenecks. The benefits of sharding include** horizontal scaling, improved read and write performance**, and the ability to handle very large datasets by adding more shards.