MongoDB Flashcards
What is MongoDB
No SQL or non-relational database
Collection
A group of documents in MongoDB. Do not enforce a rigid schema
Document
a record stored in JSON-like format, and each document in the collection has a different structure
Visualize a collection
[
{
“name”: “Bob”,
“age”: 30
},
{
“name”: “Charlie”,
“age”: 35,
“hobbies”: [“cycling”, “hiking”]
}
]
Visualize a document
{“name”: “Alice”, “age”:25}
documents are stored ({..})
Query using .find()
db.collection.find()
General querying; returns matching documents.
$gt
db.collection.find({“field_name”: {“$gt”: data}})
Matches values greater than a specified value.
Update a single document in a collection
$set
db.collection.updateOne({“filter_by_fieldname”:”filter_value”},{“$set”:{“age”:modified_data}})
Delete a single document
.deleteOne
db.collection.deleteOne({“filter_name”:”condition”})
Delete multiple documents
.deleteMany
db.users.deleteMany({“age”:{“$lt”:30}}) // Deletes users younger than 30
Delete a collection
.drop()
db.collection.drop()
Import mongoDB into python
from pymongo import MongoClient
# Connect to MongoDB
client = MongoClient(“mongodb://localhost:27017/”)
Connect to or access a database
Insert a single document
db = client.name_of_database
#create or access the collection
my_collection = db.name_of_collection
Insert a document
name_of_collect.insert_one({“name”: “Lorraine”, “age”: “Forever Young”})
.find() vs. .find_one()
.find() - Retrieves all documents that match a query.
.find_one() - Retrieves only the first matching document.
Get the field names present in a collection
myCollection_fields = mycollection.keys()
print(myCollection_fields)
Save a list of the fields of the collection
myCollection_fields = list(mycollection.keys())
print(myCollection_fields)
How do you filter?
db.collection.find(
{“field_name”: {“$operator”: “value”}}, # Filter condition
{“field1”: 1, “field2”: 1, “_id”: 0} # Projection (optional)
).sort(“field_to_sort”, 1).limit(10) #create a filter variable with constraints
filter_doc = {
‘gender’:’male’,
‘surname’: ‘%A’
}
Value in a range
$in: <list></list>
db.myCollection.count_documents({
‘Countries’: {
‘$in’: [‘France’,’USA’]
}
})
Create a filter criteria to count laureates who died (“diedCountry”) in the USA (“USA”). Save the document count as count.
Create a filter for laureates who died in the USA
criteria = {“diedCountry”: “USA”}
# Save the count of these laureates
count = db.laureates.count_documents(criteria)
print(count)
$ne
criteria = { “diedCountry”: “USA”,
“bornCountry”: { “$ne”: “USA”},
}
How many laureates were born in “USA”, “Canada”, or “Mexico”? Save a filter as criteria and your count as count.
Save a filter for laureates born in the USA, Canada, or Mexico
criteria = { “bornCountry”:
{ “$in”: [“Canada”, “Mexico”, “USA”]}
}
# Count them and save the count
count = db.laureates.count_documents(criteria)
print(count)
$exists
Check for the existence of an field regardless of content
db.collection_name.count_documents(“prizes.0”: {“$exists”:True}})
fields.#
“field_name.0”
References a field and the documents and the position of the element with dot notation
T or F: This is how you would query documents if you wanted to check for documents without the field “born”
criteria = {“born”: {“$exists”: True}}
False. That checks for fields with.
This checks for fields without.
criteria = {“born”: {“$exists”: False}}
Using the .distinct to show a list of countries that are in the diedCountry field that do not also appear in the bornCountry
countries = set(db.laureates.distinct(“diedCountry”)) - set(db.laureates.distinct(“bornCountry”))
print(countries)
.len()
The len() function in Python calculates the number of elements in the list returned by .distinct().
.set()
Converts data returned into a python set. A set is an unordered collection of unique elements in Python, which is ideal for performing set operations like union, intersection, and difference.
When do you need to use .set()
Eliminate Duplicates from a collection.
Perform Set Operations such as union, intersection, difference, or symmetric difference.
Ensure the uniqueness of elements in a collection.
Visualize how to find which countries have USA-born laureates had affiliations for their prizes?
db.laureates.distinct(“prizes.affiliations.country”, {“bornCountry”: “USA”})
.distinct()
Retrieves unique values for a field.
db.prizes.distinct(“category”)
.findOne()
Retrieves a single matching document.
db.laureates.findOne({“bornCountry”: “USA”})
.count_documents()
Counts the number of documents matching a query.
db.laureates.count_documents({“bornCountry”: “USA”})
.aggregate()
Performs advanced queries, transformations, or grouping.
db.prizes.aggregate([{“$group”: {“_id”: “$category”, “count”: 1}}])
What are the ways to query in MongoDB
.find, .distinct, .findOne, .count_documents, .aggregate
$elemMatch
is a query operator in MongoDB used to match specific elements in an array field. It allows you to query for documents where at least one element in an array satisfies multiple conditions.
Visualize how to use $elemMatch to find the number of laureates who won an unshared ({“share”: “1”}) prize in physics after World War II ({“year”: {“$gte”: “1945”}}) ?
db.laureates.count_documents({“prizes”:
{“$elemMatch”:{
“share”: “1”,
“category”: “physics”,
“year”: {“$gte”: “1945”}
}}})
$eq
{ age: { $eq: 25 } }
Matches values equal to a specified value.
$gte
{ age: { $gte: 25 } }
Matches values greater than or equal to a specified value.
$lt and $lte
Matches values less than or equal to a specified value.
$nin
{ age: { $nin: [20, 25, 30] } }
Matches none of the values in a specified array.
What are the logical operators?
$and, $or, $not, $nor
$nor
Matches documents that fail all conditions.
Use an if statement to show the collection
collection = db.laureates
for document in collection.find():
print(document)
Visualize how to: discover how many laureates in total have a first name beginning with “G” and a surname beginning with “S”?
db.laureates.count_documents({“firstname”: Regex(“^G”), “surname”: Regex(“^S”) })
regex()
operator is used to perform pattern matching in string fields using regular expressions.
{ “field”: { “$regex”: pattern, “$options”: options } }
^ (Caret)
Matches the start of a string.
{ “name”: { “$regex”: “^J” } }
What matches the end of a string?
$ (Dollar Sign)
{ “name”: { “$regex”: “son$” } }
What Acts as an escape character to include special characters in a regex pattern?
\ (Backslash)
{ “email”: { “$regex”: “\.” } }
$options
The $options modifier allows you to control how the regex matches strings. It supports the following flags:
{ “name”: { “$regex”: “john”, “$options”: “i” } }
Common $options:
i: Case-insensitive matching.
m: Multiline mode.
x: Ignore whitespace and comments in the pattern.
s: Allows . to match newline characters.
u: Enables Unicode case foldin
.*:
Matches any characters (zero or more)
What Match Strings with Special Characters
{ “field”: { “$regex”: “\$” } }
Visualize using a regular expression object to filter for laureates with “Germany” in their “bornCountry” value.
criteria = {“bornCountry”: Regex(“Germany”)}
print(set(db.laureates.distinct(“bornCountry”, criteria)))
How do you import Regex for MongoDB?
from bson.regex import Regex
.insert_many()
db.myCollection.insertMany([
{ name: “Charlie”, age: 35 },
{ name: “Diana”, age: 28 }
])
project={}
To specify which fields you want to include or exclude in the query results.
To include only certain fields in the result, set their value to 1 in the projection. To exclude fields, set their value to 0. Note:
You cannot mix 1 (include) and 0 (exclude) in the same projection, except for _id.
docs = db.collection.find(
filter={“age”: {“$gt”: 30}}, # Filter: Find documents where age > 30
projection={“name”: 1, “age”: 1, “_id”: 0} # Include only “name” and “age”; exclude “_id”
)
Visualize using a dictionary to get only the first name and last name from a dictionary.
Use projection to select only firstname and surname
docs = db.laureates.find(
filter= {“firstname” : {“$regex” : “^G”},
“surname” : {“$regex” : “^S”} },
projection={“name”: 1, “surname”: 1} )
Print the first document
print(docs[0])
Iterate over the documents, and for each document, concatenate the first name and the surname fields together with a space in between to obtain full names.
Use projection to select only firstname and surname
docs = db.laureates.find(
filter= {“firstname” : {“$regex” : “^G”},
“surname” : {“$regex” : “^S”} },
projection= [“firstname”, “surname”] )
Iterate over docs and concatenate first name and surname
full_names = [doc[“firstname”] + “ “ + doc[“surname”] for doc in docs]
Print the full names
print(full_names)
itemgetter
itemgetter(key) is used to retrieve a specific field (or key) from dictionaries within a list.
visualize using itemgetter
from operator import itemgetter
Create an index
collection.create_index([(“field_name”, 1)])
You would use 1 for ascending, -1 for descending.
Purpose: Speed up queries by category and sort by year.
Purpose: Identify all unique prize categories.
Purpose: Get the most recent prize for a category awarded to a Purpose: Create a human-readable report.
Encapsulation: Wrap the logic in a function with parameters for collection and field names for reuse.
def laureate_report(collection, category_field=”category”, year_field=”year”, share_field=”laureates.share”, share_value=”1”):
# Create a compound index on category and year field index_model = [(category_field, 1), (year_field, -1)] collection.create_index(index_model)
report = “”
# For each distinct category (sorted alphabetically)
for category in sorted(collection.distinct(category_field)):
doc = collection.find_one(
{category_field: category, share_field: share_value},
sort=[(year_field, -1)] )
# Append the formatted category and year to the report
report += “{category}: {year}\n”.format(**doc)
return report
An empty {}
An empty {} means “match everything”, so it retrieves all documents.
db.users.find({}, {“age”: 0})
Visualize retrieving a given page of prize data on laureates who have the word “particle” (use $regex) in their prize motivations (“prizes.motivation”). Sort laureates first by ascending “prizes.year” and next by ascending “surname”.
particle_laureates = list(
db.laureates.find(
{“prizes.motivation”: {“$regex”: “particle”}},
[“firstname”, “surname”, “prizes”])
.sort([(“prizes.year”, 1), (“surname”, 1)])
.skip(page_size * (page_number - 1))
.limit(page_size))
Visualize how to get the below output with projection:
{“name”: “Bob”, “age”: 30}
{“name”: “Charlie”, “age”: 35}
db.users.find({}, {“name”: 1, “age”: 1, “_id”: 0})
Visualize how to exclude age from the below output with projection:
{“_id”: ObjectId(“…”), “name”: “Bob”}
{“_id”: ObjectId(“…”), “name”: “Charlie”, “hobbies”: [“cycling”, “hiking”]}
db.users.find({}, {“age”: 0}):
Visualize how to retrieve only users where age is greater than 30
db.users.find({“age”: {“$gt”: 30}})
Visualize how to Retrieve a document where name is “Bob”
db.users.find_one({“name”: “Bob”})
Visualize how to retrieve 2 users per page, skipping the first 2
db.users.find().skip(2).limit(2)
$match
Filters Documents (Like WHERE in SQL)
Filters documents before further processing in the pipeline.
Works just like find(), but inside an aggregation pipeline.
db.users.aggregate([
{ “$match”: { “age”: { “$gt”: 30 } } }
])
SQL EQuivalent: SELECT * FROM users WHERE age > 30;
$project
Reshapes Output (Like SELECT in SQL)
Controls which fields to include/exclude in the final result.
Can also rename fields and create computed fields.
Can Performs field comparisons and calculations.
SQL Equiv:
SELECT name,
CASE WHEN age >= 30 THEN ‘Senior’ ELSE ‘Junior’ END AS age_category
FROM users;
$group
Groups Documents (Like GROUP BY in SQL)
Groups multiple documents based on a shared field.
Used for aggregations like SUM(), COUNT(), AVG().
$unwind
Expands Arrays (Like JOIN UNNEST in SQL)
Flattens arrays so that each array element becomes a separate document.
Helpful for analyzing nested array data.
If needed, use $unwind: { preserveNullAndEmptyArrays: true } to keep documents that don’t have arrays.
What is the SQL equivalent to below code?
SELECT hobbies, COUNT(*) AS count
FROM users
WHERE age > 30
GROUP BY hobbies
ORDER BY count DESC;
db.users.aggregate([
{ “$match”: { “age”: { “$gt”: 30 } } }, # Step 1: Filter users over 30
{ “$unwind”: “$hobbies” }, # Step 2: Expand hobbies array
{ “$group”: { “_id”: “$hobbies”, “count”: { “$sum”: 1 } } }, # Step 3: Count hobbies
{ “$sort”: { “count”: -1 } }, # Step 4: Sort by most common
{ “$project”: { “hobby”: “$_id”, “count”: 1, “_id”: 0 } } # Step 5: Format output
])
from collections import OrderedDict
It’s a special type of dictionary (dict) that remembers the order in which items were inserted.
In Python 3.6+, regular dictionaries also maintain insertion order, but OrderedDict is still useful for older versions or special cases.
Visualize a pipeline creation
create_pipeline [
{“$match”: {filter_field: {“$in”: list(filter_values)}}}, # Filters documents
{“$project”: {field: 1 for field in project_fields}}, # Selects specific fields
{“$sort”: OrderedDict([(sort_field, sort_order)])} # Sorts results
]
Aggregations (use of the $)
a process of transforming and analyzing data in MongoDB. Instead of just retrieving documents (find()), aggregation processes multiple documents to compute statistics, filter, group, or reshape data.
🔹 Example: Instead of listing all sales, we might want to compute the total revenue.
Explain Field references → “fieldName” vs. “$fieldName”
✅ Use $ Before a Field Name to Reference It
When you want to refer to a field inside an aggregation stage, you must use $.
T or F: These aggregation operators $match and $group start with $
False. All MongoDB aggregation operators (like $match, $group, $sum, etc.) start with $.
When do you not use $?
For string literals
For field names outside of aggregations
$lookup
Joins Collections (Like JOIN in SQL)
Performs a left outer join between two collections.
Adds matching documents from another collection.
T or F: db[“prizes”].distinct(“category”) is the same as db.prizes.distinct(“category”)
True.
When to Use Brackets Instead of Dots?
If your collection name has special characters or spaces:
db[“prizes-data”].find()
If your collection name matches a Python keyword:
db[“class”].find()
$expr
$expr allows you to use aggregation expressions inside a find() query.
It enables comparisons between fields within the same document, which $match alone cannot do.
It allows you to use operators like $gt, $eq, $and, etc., on document fields dynamically.
True or false. $expr and $match perform field comparisons and computations
False. Only $expr allows field comparisons and computations. $match does standard filtering and simple conditions like field = value or field > value.
.stats()
db.collection.stats()
Use this to see size, storage, and index details.
db.collection.explain()
db.collection.find({“Rank”: {“$gte”: 4}}).explain(“executionStats”)
Use this to see how MongoDB executes a query.
.aggregate()
MongoDB’s .aggregate() function is used for advanced data processing. It allows you to filter, group, sort, transform, and analyze data efficiently.
Think of .aggregate() as a pipeline where documents pass through stages that modify or filter them.
.creatIndex()
db.collection.createIndex({ “Rank”: 1 })
.getIndexes()
shows how to get indexes that exist
db.collections.getIndexes()
Visualize using $project to ✅ Find employees whose bonus is greater than 10% of their salary
db.employees.aggregate([
{“$project”: { “name”: 1, “salary”: 1, “bonus”: 1,
“bonus_gt_10_percent”: {
“$gt”: [“$bonus”, { “$multiply”: [“$salary”, 0.1] }]}}}})
$indexOfBytes
Finds the position of a substring inside a string.
Returns -1 if the substring is not found or True/False
Works case-sensitive (unlike $indexOfCP which is Unicode-aware).
$addFields
Add a new field inside a sub-document (address.full)
db.users.aggregate([
“$addFields”: {“address.full”: { “$concat”: [“$address.city”, “, “, “$address.country”] }}])
Input:
{ “name”: “Alice”, “address”: { “city”: “New York”, “country”: “USA” } }
Output:
{ “name”: “Alice”, “address”: { “city”: “New York”, “country”: “USA”, “full”: “New York, USA” } }
Create a new database
use my_database