Week 5: NoSQL Databases and MongoDB Flashcards

Question 1

Q

NoSQL

Answer

A

NoSQL databases are non-relational, highly-scalable and fault tolerant, designed for large, distributed, semi-structured and unstructured data, built mostly for queries and few asynchronous inserts and updates, and are accessible through API-based query interfaces and data-specific query languages.

Availability is favoured over consistency, approximate answers are acceptable, and overall the system is simpler and faster.

Question 2

Q

ACID

Answer

A

Relational databases have the following 4 properties:

Atomicity: each transaction is a single, indivisible unit.
Consistency: the data is accurate and meets pre-existing requirements after each transaction.
Isolation: concurrent transactions don’t affect each other.
Durability: changes resulting from transactions are stored event in the event of failures.

Question 3

Q

BASE

Answer

A

This acronym describes the properties of NoSQL databases.

Basically Available: the client’s request will always be acknowledged. Availability is prioritised even if system failures may jeapordise successful completion of the client’s request.
Soft State: the data may be inconsistent when its read.
Eventually Consistent: read requests after write requests may not return consistent results, but they’ll be updated once changes are propagated to all notes.

Question 4

Q

3 V’s of Big Data

Answer

A

Volume: NoSQL databases allow scaling out (adding more nodes to the commodity server).

Velocity: fast writes using schema-on-read (data are applied to the schema as they leave the database). This allows for low write latency (adding nodes decreases latency).

Variety: can store semi-structured and unstructured data (schema is loose or non-existent).

Question 5

Q

RDBMS vs NoSQL

Answer

A

Elastic Scaling:
- RDBMS scales up, with bigger server handling bigger loads.
- NoSQL scales out by distributing data across multiple hosts seamlessly.

Big Data:
- RDBMS doesn’t scale up well to handle big data.
- NoSQL is designed for big data.

DBA Specialists:
- RDBMS requires highly trained experts to monitor DB.
- NoSQL requires less management, automatically repairs itself, and has simpler data models.

Flexible Data Models:
- RDBMS needs careful schema change management.
- NoSQL databases don’t need complicated schema management.

Economic Cost:
- RDBMS relies on expensive proprietary servers to manage data.
- NoSQL uses clusters of cheap commodity servers to manage data and transaction volumes. the cost per gigabyte or transactions/second for NoSQL can be lower than the cost for RDBMS.

Lack of Expertise:
- There are plenty of experienced RDBMS developers.
- There are fewer NoSQL developers.

Analytics and Business Intelligence:
- RDBMS is designed for analytics.
- NoSQL is designed for the needs of Web 2.0, not for ad hoc data queries.

Question 6

Q

NoSQL Database Types:

Answer

A

Key/Value: “Hashtable” of keys
Examples: redis, riak

Document: stores documents comprised of tagged elements
Examples: MongoDB, CouchDB

Column-family: each storage block contains data from one column
Examples: Cassandra, H-Base

Graph: stores graph-structured data (nodes and edges)
Examples: Neo4j, HyperGraphDB

Question 7

Q

Key-value Databases

Answer

A

They store key value pairs, with keys being unique. Values are only retrievable using keys and are opaque to the database. Key-value pairs are organised into collections/buckets. Data are partitioned across nodes by keys. The partition for a key is determined by hashing the key.

Pros:
- Very fast, simple model, able to scale horizontally
- Good for unstructured data, fast read/writes, when a key suffices for identifying a value, no dependencies among values, and simple insert/delete/select operations.

Cons:
- Many data structures (objects) can’t be easily modelled as key-value pairs
- Not good for operations (search, filter, update) on individual attributes of a value, and operations on multiple keys in a single transaction.

Question 8

Q

Document Databases

Answer

A

These store documents in semi-structured form. A document is in a nested structure in JSON or XML format.

Suitable for:
- Semi-structured data with a flat or nested schema.
- Search for different values of the document.
- Updates on subsets of values.
- CRUD (Create, Read, Update, Delete) operations.
- Schema changes are likely.

Unsuitable for:
- Binary data.
- Updates on multiple documents in a single transaction.
- Joins between multiple documents.

Question 9

Q

Key-value vs Document Databases

Answer

A

In document databases, each document has a unique key
Document databases provide more support for value operations, as they’re aware of values, selection operations can retrieve fields or parts of values, subsets of values can be updated together, indexes are supported, and each document has a schema that can be inferred from the structure of the value.

Question 10

Q

Column-family Databases

Answer

A

These databases store columns, with each column having a name and value. Columns related to each other are grouped into rows. Rows don’t necessarily have a fixed schema or number of columns.

Suitable for:
- Data that has a tabular structure with many columns and sparsely populated rows.
- Columns that are interrelated and accessed together often.
- OLAP (Online Analytical Processing).
- Realtime random read-write is needed
Insert/select/update/delete operations.

Unsuitable for:
- Joins.
- ACID support is needed.
- Binary Data.
- SQL-compliant queries.
- Frequently changing query patterns that lead to column restructuring.

Applications:
- Data warehousing
- Data Mining
- Google BigTable
- RDF (Resource Description Framework)
- Info Retrieval
- Scientific Datasets

Question 11

Q

Graph Databases

Answer

A

Data is stored in a graph-like structure. Nodes represent entities and have sets of attributes. Edges represent relationships and have sets of attributes. These databases are optimised for representing connections, as adding and removing edges and attributes are easy. The underlying storage can be native graph storage, relational database, key/value database, document database, etc.

Suitable for:
- Data comprised of interconnected entities.
- Queries are based on entity relationships.
- Need to find groups of interconnected entities.
- Need to find distances between entities.

Unsuitable for:
- Joins.
- ACID support is needed.
- Binary data.
- SQL-compliant queries.
- Frequently changing query patterns that lead to column restructuring.

Applications
- Social
- Recommendation
- Geography

Question 12

Q

MongoDB

Answer

A

It’s a document database. It’s hash-based, meaning that it stores hashes (system-assign _id) with keys and values for each document. MongoDB has a dynamic schema and uses the BSON (Binary JSON) format. It has API’s for many languages.

Question 13

Q

MongoDB: Insert

Answer

A

Example:
To insert a document with _id of 10, field item with value of “box”, and field quantity with a value of 20,

db.products.insert({_id:10,item:”box”,qty:20})

Example:
Inserting multiple documents,

db.inventory.insertMany([
{item.”journal”,qty:25,tags:[“blank”,”red”],size:{h:14,w:21,uom:”cm”}},
{item:”mat”,qty:85,tags:[“gray”],size:{h:27.9,w:35.5,uom:”cm”}}
])

Question 14

Q

MongoDB: Find

Answer

A

Example:
Finding documents with a quantity greater than 4,

db.products.find{{qty:{$gt4}})

Question 15

Q

MongoDB: Update

Answer

A

Example:

db.books.update{
{_id:1},
{
$inc:{stock:5},
$set:{
item:”ABC123”,
“info.publisher”:”2222”,
tags:[“software”],
“ratings.1”:{by:”xyz”,rating:3}
}
}
}

Question 16

Q

MongoDB: Remove

Answer

Study These Flashcards

A

Example:
Remove all documents in the collection “products”,

db.products.remove({})

Example:
Remove all documents with item=box

db.products.remove({“item”:”box”})

Example:
Remove all documents with a quantity greater than 20,

db.products.remove{
{qty:{$gt:20}},
}

Question 17

Q

MongoDB: Index Support

Answer

Study These Flashcards

A

Users can create, view, and drop indexes.

Commands:
View
db.system.indexes.find()

Get Indexes on collectionA
db.collectionA.getIndexes()

Drop all indexes (other than required 1 on _d)
db.collectionA.dropIndexes()

Drop index with name “catIdx”
db.collectionA.dorpIndex(“catIdx”)

Create 2dsphere Index on “loc” field
db.collectionA.createIndex({loc:”2dsphere”})

Question 18

Q

MongoDB: Replication

Answer

Study These Flashcards

A

This is a feature of MongoDB. Multiple replicas of datasets are stored. This provides scalability, availability, and fault tolerance. The primary instance (replica) receives operation requests. The secondary instances apply operations to their data. If the primary instance doesn’t communicate with its secondaries for over 10 seconds, one of the secondary instances becomes the new primary instance after elections.

Question 19

Q

MongoDB: Sharding

Answer

Study These Flashcards

A

This process horizontally partitions the dataset into shards that are distributed across multiple nodes. Each node is only responsible for its shard. If shards are unavailable, partial reads and writes help with availability.

Benefits:
- Efficient reads and writes, they’re distributed across shards
- Storage capacity, each shard has part of the dataset
- High availability, partial read/write operations are performed if shards are unavailable

Question 20

Q

MongoDB: App Server

Answer

Study These Flashcards

A

Each App Server has a single Router(mongos), which acts as an interface between the applications and the shared cluster. It processes all requests and decides how the query is distributed based on the metadata from the config server.

Question 21

Q

MongoDB: Config Servers (replica set)

Answer

Study These Flashcards

A

These store the metadata and configuration settings for clusters. Config servers in shared clusters can be implemented as replica sets.

Question 22

Q

MongoDB: Shard (replica set)

Answer

Study These Flashcards

A

To benefit from replication, shards and config servers may be implemented as replica sets.

Question 23

Q

MongoDB: MapReduce Functionality

db.orders.mapReduce(
function() {emit(this.cust_id,this.amount);},
function(key,values) {return Array.sum(values)},
{
query: {status: “A”},
out: “order_totals”
}
)

Answer

Study These Flashcards

A

function() line maps value with the key and emits the key and value pair.

function(key,values) line reduces all values associated with a particular key to a single object.

query line selects the input documents to the map function.

out line is the location of the result.

“this” refers to the document that the map-reduce operation is processing

Question 24

Q

CAP Theorem

Answer

Study These Flashcards

A

This theorem applies to distributed systems and deals with the trade-offs between three properties;
1. Consistency
2. Availability
3. Partition Tolerance (the system continues to operate even if network partitions divide the system into isolated groups.

A distributed system can only guarantee 3 of the 4 ACID properties.

Week 5: NoSQL Databases and MongoDB Flashcards

(24 cards)