Revature NoSQL, Hadoop, Hive,UNIX, Distributed Systems Flashcards
What does BASE stand for?
BASE stands for Basically Available, Soft state, Eventual consistency, which contrasts with the ACID properties of traditional databases.
What is the CAP Theorem?
CAP Theorem states that a distributed data store can only provide two out of the following three guarantees: Consistency, Availability, and Partition Tolerance.
What does CAP mean for our distributed data stores when they have network problems?
In the event of network problems (partition tolerance), a distributed data store must sacrifice either consistency or availability.
What is a database in Mongo?
In MongoDB, a database is a container for collections, which are groups of documents.
What is a collection?
A collection is a group of MongoDB documents, similar to a table in a relational database.
What is a document?
A document is a record in MongoDB, represented in BSON (Binary JSON) format.
What rules does Mongo enforce about the structure of documents inside a collection?
MongoDB does not enforce a strict schema, allowing documents in a collection to have different fields and structures.
What is a distributed application?
A distributed application is software that runs on multiple computers or nodes, often to provide scalability and fault tolerance.
What is a distributed data store?
A distributed data store is a database that stores data across multiple nodes or servers.
What is High Availability? How is it achieved in Mongo?
High Availability ensures a system remains operational even during failures. In MongoDB, it is achieved using replica sets.
What is Scalability? How is it achieved in Mongo?
Scalability refers to the ability to handle increased loads by scaling horizontally or vertically. MongoDB achieves this using sharding for horizontal scaling.
Explain replica sets and sharding.
Replica sets are groups of MongoDB servers that provide redundancy and high availability. Sharding splits data across multiple servers to handle large datasets and high throughput.
What are NoSQL databases? What are the different types of NoSQL databases?
NoSQL databases are non-relational databases. Types include document, key-value, column-family, and graph databases.
What kind of NoSQL database MongoDB is?
MongoDB is a document-based NoSQL database.
Which are the most important features of MongoDB?
MongoDB features include schema-less design, high performance, horizontal scaling, built-in replication, and support for geospatial queries.
What is a Namespace in MongoDB?
A Namespace in MongoDB is a combination of the database name and the collection name.
Which all languages can be used with MongoDB?
MongoDB supports many languages including Python, Java, Node.js, Ruby, PHP, C#, and more.
Compare SQL databases and MongoDB at a high level.
SQL databases are relational and enforce schemas, while MongoDB is schema-less and uses documents for flexibility.
How is MongoDB better than other SQL databases?
MongoDB provides flexibility, horizontal scaling, and faster iteration for applications with varying data structures.
Compare MongoDB and CouchDB at high level.
MongoDB uses a BSON-based document model, supports rich queries, and high write performance. CouchDB focuses on replication and offline-first design using JSON documents.
Does MongoDB support foreign key constraints?
No, MongoDB does not support foreign key constraints.
Does MongoDB support ACID transaction management and locking functionalities?
MongoDB supports ACID transactions at the document level and multi-document transactions starting from version 4.0.
How can you achieve primary key - foreign key relationships in MongoDB?
You can achieve this by embedding related documents or using references between documents.
When should we embed one document within another in MongoDB?
Embedding is ideal when related data is frequently accessed together and has a one-to-one or one-to-few relationship.
Explain can you move old files in the moveChunk directory?
Yes, you can move or delete old files in the moveChunk directory after ensuring they are no longer in use.
Mention what is ObjectId composed of?
ObjectId is composed of a 12-byte value: a 4-byte timestamp, 5 bytes of random value, and 3 bytes of an incrementing counter.
What was the Hadoop Explosion?
The Hadoop Explosion refers to the rapid adoption and development of the Hadoop ecosystem for big data processing in the 2000s.
What about CDH?
CDH stands for Cloudera Distribution of Hadoop. It is one of the ways Hadoop is used in the wild. Cloudera and HortonWorks, two major companies supporting Hadoop clusters, merged a few years ago.
What are some differences between hard disk space and RAM?
Hard disk space provides long-term storage with slower access times, while RAM provides temporary, faster-access storage for active processes.
What is a VM?
A VM (Virtual Machine) is a software-based emulation of a computer system that runs its own operating system.