ORM Flashcards
How do you find the query associated with a queryset?
You have a model called Event. For getting all records, you will write something like Event.objects.all(), then do str(queryset.query)
> > > queryset = Event.objects.all()
str(queryset.query)
SELECT “events_event”.”id”, “events_event”.”epic_id”,
“events_event”.”details”, “events_event”.”years_ago”
FROM “events_event”
or
> > > queryset = Event.objects.filter(years_ago__gt=5)
str(queryset.query)
SELECT “events_event”.”id”, “events_event”.”epic_id”, “events_event”.”details”,
“events_event”.”years_ago” FROM “events_event”
WHERE “events_event”.”years_ago” > 5
How do you use OR filter queries in Django ORM?
Say you want find all users with firstname starting with ‘R’ and last_name starting with ‘D’.
ORM query would look like
queryset = User.objects.filter( first_name\_\_startswith='R' ) | User.objects.filter( last_name\_\_startswith='D' ) queryset , , , , ]> The | represents the or logical operator
Alternatively, you can use the Q objects.
qs = User.objects.filter(Q(first_name__startswith=’R’)|Q(last_name__startswith=’D’))
The Q represents the or logical operator
How do you use AND filter queries in Django ORM?
The default way to combine multiple conditions in filter is AND, so you can just do.
queryset_1 = User.objects.filter( first_name\_\_startswith='R', last_name\_\_startswith='D' ) Alternatively, you can explicitly use the & operator on querysets.
queryset_2 = User.objects.filter( first_name\_\_startswith='R' ) & User.objects.filter( last_name\_\_startswith='D' ) For complete customisability, you can use the Q objects.
queryset_3 = User.objects.filter(
Q(first_name__startswith=’R’) &
Q(last_name__startswith=’D’)
)
queryset_1
, , ]>
How do you use NOT filter queries in Django ORM?
Method 1 using exclude
Method 2 using Q() method
> > > from django.db.models import Q
queryset = User.objects.filter(~Q(id__lt=5))
queryst
, , , , , ]>
How to do union of two querysets from same or different models?
The UNION operator is used to combine the result-set of two or more querysets. The querysets can be from the same or from different models. When the querysets are from different models, the fields and their datatypes should match.
>>> q1 = User.objects.filter(id\_\_gte=5) >>> q1 , , , , , ]> >>> q2 = User.objects.filter(id\_\_lte=9) >>> q2 , , , , , , , , ]> >>> q1.union(q2) , , , , , , , , , ]> >>> q2.union(q1) , , , , , , , , , ]> The union operation can be performed only with the querysets having same fields and the datatypes. Hence our last union operation encountered error. You can do a union on two models as long as they have same fields or same subset of fields.
Since Hero and Villain both have the name and gender, we can use values_list to limit the selected fields then do a union.
Hero.objects.all().values_list( "name", "gender" ).union( Villain.objects.all().values_list( "name", "gender" )) This would give you all Hero and Villain objects with their name and gender.
How to select some fields only in a queryset?
Django provides two ways to do this
values and values_list methods on queryset.
only_method
Say, we want to get first_name and last_name of all the users whose name starts with R. You do not want the fetch the other fields to reduce the work the DB has to do.
>>> User.objects.filter( first_name\_\_startswith='R' ).values('first_name', 'last_name') > queryset = User.objects.filter( first_name\_\_startswith='R' ).only("first_name", "last_name") str(queryset.query), gives us
SELECT “auth_user”.”id”, “auth_user”.”first_name”, “auth_user”.”last_name”
FROM “auth_user” WHERE “auth_user”.”first_name”::text LIKE R%
The only difference between only and values is only also fetches the id.
How to do a subquery expression in Django?
Django allows using SQL subqueries. Let’s start with something simple, We have a UserParent model which has OnetoOne relation with auth user. We will find all the UserParent which have a UserParent.
> > > from django.db.models import Subquery
users = User.objects.all()
UserParent.objects.filter(user_id__in=Subquery(users.values(‘id’)))
, , ]>
Now for something more complex. For each Category, we want to find the most benevolent Hero.
The models look something like this.
class Category(models.Model): name = models.CharField(max_length=100)
class Hero(models.Model): # ... name = models.CharField(max_length=100) category = models.ForeignKey(Category, on_delete=models.CASCADE)
benevolence_factor = models.PositiveSmallIntegerField( help_text="How benevolent this hero is?", default=50 ) You can find the most benevolent Hero like this
hero_qs = Hero.objects.filter( category=OuterRef("pk") ).order_by("-benevolence_factor") Category.objects.all().annotate( most_benevolent_hero=Subquery( hero_qs.values('name')[:1] ) ) If you look at the generated sql, you will see
SELECT “entities_category”.”id”,
“entities_category”.”name”,
(SELECT U0.”name”
FROM “entities_hero” U0
WHERE U0.”category_id” = (“entities_category”.”id”)
ORDER BY U0.”benevolence_factor” DESC
LIMIT 1) AS “most_benevolent_hero”
FROM “entities_category”
Let’s break down the queryset logic. The first part is
hero_qs = Hero.objects.filter(
category=OuterRef(“pk”)
).order_by(“-benevolence_factor”)
We are ordering the Hero object by benevolence_factor in DESC order, and using category=OuterRef(“pk”) to declare that we will be using it in a subquery.
Then we annotate with most_benevolent_hero=Subquery(hero_qs.values(‘name’)[:1]), to get use the subquery with a Category queryset. The hero_qs.values(‘name’)[:1] part picks up the first name from subquery.
How to filter a queryset with criteria based on comparing their field values
Django ORM makes it easy to filter based on fixed values. To get all User objects with first_name starting with ‘R’, you can do User.objects.filter(first_name__startswith=’R’).
What if you want to compare the first_name and last name? You can use the F object. Create some users first.
In [27]: User.objects.create_user(email=”shabda@example.com”, username=”shabda”, first_name=”Shabda”, last_name=”Raaj”)
Out[27]:
In [28]: User.objects.create_user(email=”guido@example.com”, username=”Guido”, first_name=”Guido”, last_name=”Guido”)
Out[28]:
Now you can find the users where first_name==last_name
In [29]: User.objects.filter(last_name=F(“first_name”))
Out[29]: ]>
F also works with calculated field using annotate. What if we wanted users whose first and last names have same letter?
You can set the first letter from a string using Substr(“first_name”, 1, 1), so we do.
In [41]: User.objects.create_user(email=”guido@example.com”, username=”Tim”, first_name=”Tim”, last_name=”Teters”)
Out[41]:
#…
In [46]: User.objects.annotate(first=Substr(“first_name”, 1, 1), last=Substr(“last_name”, 1, 1)).filter(first=F(“last”))
Out[46]: , ]>
How to filter FileField without any file?
A FileField or ImageField stores the path of the file or image. At the DB level they are same as a CharField.
So to find FileField without any file we can query as under.
no_files_objects = MyModel.objects.filter(
Q(file=’’)|Q(file=None)
How to perform join operations in django ORM?
A SQL Join statement is used to combine data or rows from two or more tables based on a common field between them. Join can be carried out in many ways. Some are shown below.
> > > a1 = Article.objects.select_related(‘reporter’) // Using select_related
a1
, , , , , ]>
print(a1.query)
SELECT “events_article”.”id”, “events_article”.”headline”, “events_article”.”pub_date”, “events_article”.”reporter_id”, “events_article”.”slug”, “auth_user”.”id”, “auth_user”.”password”, “auth_user”.”last_login”, “auth_user”.”is_superuser”, “auth_user”.”username”, “auth_user”.”first_name”, “auth_user”.”last_name”, “auth_user”.”email”, “auth_user”.”is_staff”, “auth_user”.”is_active”, “auth_user”.”date_joined” FROM “events_article” INNER JOIN “auth_user” ON (“events_article”.”reporter_id” = “auth_user”.”id”) ORDER BY “events_article”.”headline” ASC
a2 = Article.objects.filter(reporter__username=’John’)
a2
, , , , ]>
print(a2.query)
SELECT “events_article”.”id”, “events_article”.”headline”, “events_article”.”pub_date”, “events_article”.”reporter_id
How to find second largest record using Django ORM ?
You would across situations when you want to find second highest user depending on their age or salary.
Though the ORM gives the flexibility of finding first(), last() item from the queryset but not nth item. You can do it using the slice operator.
We can find Nth records from the query by using slice operator.
> > > user = User.objects.order_by(‘-last_login’)[1] // Second Highest record w.r.t ‘last_login’
user.first_name
‘Raghu’
user = User.objects.order_by(‘-last_login’)[2] // Third Highest record w.r.t ‘last_login’
user.first_name
‘Sohan’
User.objects.order_by(‘-last_login’)[2] only pulls up the required object from db using LIMIT … OFFSET. If you look at the generated sql, you would see something like this.
SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user" ORDER BY "auth_user"."last_login" DESC LIMIT 1 OFFSET 2
Find rows which have duplicate field values
Say you want all users whose first_name matches another user.
You can find duplicate records using the technique below.
> > > duplicates = User.objects.values(
‘first_name’
).annotate(name_count=Count(‘first_name’)).filter(name_count__gt=1)
duplicates
If you need to fill all the records, you can do
> > > records = User.objects.filter(first_name__in=[item[‘first_name’] for item in duplicates])
print([item.id for item in records])
[2, 11, 13]
How to find distinct field values from queryset?
You want to find users whose names have not been repeated. You can do this like this
distinct = User.objects.values(
‘first_name’
).annotate(
name_count=Count(‘first_name’)
).filter(name_count=1)
records = User.objects.filter(first_name__in=[item[‘first_name’] for item in distinct])
This is different from User.objects.distinct(“first_name”).all(), which will pull up the first record when it encounters a distinct first_name.
How to use Q objects for complex queries?
If you want to OR your conditions.
>>> from django.db.models import Q >>> queryset = User.objects.filter( Q(first_name\_\_startswith='R') | Q(last_name\_\_startswith='D') ) >>> queryset , , , , ]> If you want to AND your conditions.
> > > queryset = User.objects.filter(
Q(first_name__startswith=’R’) & Q(last_name__startswith=’D’)
)
queryset
, , ]>
If you want to find all users whose first_name starts with ‘R’, but not if the last_name has ‘Z’
> > > queryset = User.objects.filter(
Q(first_name__startswith=’R’) & ~Q(last_name__startswith=’Z’)
)
If you look at the generated query, you would see
SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user" WHERE ("auth_user"."first_name"::text LIKE R% AND NOT ("auth_user"."last_name"::text LIKE Z%)) You can combine the Q objects in more complex ways to generate complex queries.
How to group records in Django ORM?
Grouping of records in Django ORM can be done using aggregation functions like Max, Min, Avg, Sum. Django queries help to create, retrieve, update and delete objects. But sometimes we need to get aggregated values from the objects. We can get them by example shown below
> > > from django.db.models import Avg, Max, Min, Sum, Count
User.objects.all().aggregate(Avg(‘id’))
{‘id__avg’: 7.571428571428571}
User.objects.all().aggregate(Max(‘id’))
{‘id__max’: 15}
User.objects.all().aggregate(Min(‘id’))
{‘id__min’: 1}
User.objects.all().aggregate(Sum(‘id’))
{‘id__sum’: 106}
How to efficiently select a random object from a model?
Your category models is like this.
class Category(models.Model): name = models.CharField(max_length=100)
class Meta: verbose_name_plural = "Categories"
def \_\_str\_\_(self): return self.name You want to get a random Category. We will look at few alternate ways to do this.
The most straightforward way, you can order_by random and fetch the first record. It would look something like this.
def get_random(): return Category.objects.order_by("?").first() Note: order_by('?') queries may be expensive and slow, depending on the database backend you’re using. To test other methods, we need to insert one million records in Category table. Go to your db like with python manage.py dbshell and run this.
INSERT INTO entities_category (name) (SELECT Md5(Random() :: text) AS descr FROM generate_series(1, 1000000)); You don’t need to understand the full details of the sql above, it creates one million numbers and md5-s them to generate the name, then inserts it in the DB.
Now, instead of sorting the whole table, you can get the max id, generate a random number in range [1, max_id], and filter that. You are assuming that there have been no deletions.
In [1]: from django.db.models import Max
In [2]: from entities.models import Category
In [3]: import random
In [4]: def get_random2():
…: max_id = Category.objects.all().aggregate(max_id=Max(“id”))[‘max_id’]
…: pk = random.randint(1, max_id)
…: return Category.objects.get(pk=pk)
…:
In [5]: get_random2()
Out[5]:
In [6]: get_random2()
Out[6]:
If your models has deletions, you can slightly modify the functions, to loop until you get a valid Category.
In [8]: def get_random3():
…: max_id = Category.objects.all().aggregate(max_id=Max(“id”))[‘max_id’]
…: while True:
…: pk = random.randint(1, max_id)
…: category = Category.objects.filter(pk=pk).first()
…: if category:
…: return category
…:
In [9]: get_random3()
Out[9]:
In [10]: get_random3()
Out[10]:
Unless your model has a lot of deletions, the while True: loop return quickly. Lets use timeit to see the differences.
In [14]: timeit.timeit(get_random3, number=100)
Out[14]: 0.20055226399563253
In [15]: timeit.timeit(get_random, number=100)
Out[15]: 56.92513192095794
get_random3 is about 283 time faster than get_random. get_random is the most generic way, but the technique in get_random3 will work unless you change changed the default way Django generates the id - autoincrementing integers, or there have been too many deletions.
How to use arbitrary database functions in querysets?
Django comes with functions like Lower, Coalesce and Max, but it can’t support all database functions, expecially ones which are database specific.
Django provides Func which allows using arbitrary database functions, even if Django doesn’t provide them.
Postgres has fuzzystrmatch, which provides several functions to determine similarities. Install the extension in your postgres DB with create extension fuzzystrmatch
We will use the levenshtein function. Lets first create some Hero objects.
Hero.objects.create(name=”Zeus”, description=”A greek God”, benevolence_factor=80, category_id=12, origin_id=1)
Hero.objects.create(name=”ZeuX”, description=”A greek God”, benevolence_factor=80, category_id=12, origin_id=1)
Hero.objects.create(name=”Xeus”, description=”A greek God”, benevolence_factor=80, category_id=12, origin_id=1)
Hero.objects.create(name=”Poseidon”, description=”A greek God”, benevolence_factor=80, category_id=12, origin_id=1)
We want to find out the Hero objects which have name similar to Zeus. You can do
from django.db.models import Func, F Hero.objects.annotate(like_zeus=Func(F('name'), function='levenshtein', template="%(function)s(%(expressions)s, 'Zeus')")) The like_zeus=Func(F('name'), function='levenshtein', template="%(function)s(%(expressions)s, 'Zeus')") took two arguments which allowed the database representation, viz, function and template. If you need to reuse the function, you can define a class like this.
class LevenshteinLikeZeus(Func): function='levenshtein' template="%(function)s(%(expressions)s, 'Zeus')" And then use Hero.objects.annotate(like_zeus=LevenshteinLikeZeus(F("name")))
You can then filter on this annotated field like this.
In [16]: Hero.objects.annotate( ...: like_zeus=LevenshteinLikeZeus(F("name")) ...: ).filter( ...: like_zeus\_\_lt=2 ...: ) ...: Out[16]: , , ]>
How to create multiple objects in one shot?
There are conditions when we want to save multiple objects in one go. Say we want to add multiple categories at once and we don’t want to make many queries to the database. We can use bulk_create for creating multiple objects in one shot.
Here is an example.
>>> Category.objects.all().count() 2 >>> Category.objects.bulk_create( [Category(name="God"), Category(name="Demi God"), Category(name="Mortal")] ) [, , ] >>> Category.objects.all().count() 5 bulk_create takes a list of unsaved objects.
How to copy or clone an existing model object?
There is no built-in method for copying model instances, it is possible to create new instance with all fields values copied.
If an instance is saved with instance’s pk set to None, the instance is used to create a new record in the DB. That means every field other than the PK is copied.
In [2]: Hero.objects.all().count()
Out[2]: 4
In [3]: hero = Hero.objects.first()
In [4]: hero.pk = None
In [5]: hero.save()
In [6]: Hero.objects.all().count()
Out[6]: 5
How to ensure that only one object can be created?
Sometimes you want to ensure that only one record can be created for a model. This is commonly required as application configuration store, or as a locking mechanism to access shared resources.
Let us convert our Origin model to be singleton.
class Origin(models.Model): name = models.CharField(max_length=100)
def save(self, *args, **kwargs): if self.\_\_class\_\_.objects.count(): self.pk = self.\_\_class\_\_.objects.first().pk super().save(*args, **kwargs) What did we do? We overrode the save method, and set the pk to an existing value. This ensures that when create is called and any object exists, an IntegrityError is raised.
How to update denormalized fields in other models on save?
You have models like this.
class Category(models.Model): name = models.CharField(max_length=100) hero_count = models.PositiveIntegerField() villain_count = models.PositiveIntegerField()
class Meta: verbose_name_plural = "Categories"
class Hero(models.Model): name = models.CharField(max_length=100) category = models.ForeignKey(Category, on_delete=models.CASCADE) # ...
class Villain(models.Model): name = models.CharField(max_length=100) category = models.ForeignKey(Category, on_delete=models.CASCADE) # ... You need the hero_count and villain_count, to be updated when new objects are created.
You can do something like this
class Hero(models.Model): # ...
def save(self, *args, **kwargs): if not self.pk: Category.objects.filter(pk=self.category_id).update(hero_count=F('hero_count')+1) super().save(*args, **kwargs)
class Villain(models.Model): # ...
def save(self, *args, **kwargs): if not self.pk: Category.objects.filter(pk=self.category_id).update(villain_count=F('villain_count')+1) super().save(*args, **kwargs) Note how we did not use self.category.hero_count += 1, as update will do a DB update.
The alternative method is using signals. You can do it like this.
from django.db.models.signals import pre_save
from django.dispatch import receiver
@receiver(pre_save, sender=Hero, dispatch_uid="update_hero_count") def update_hero_count(sender, **kwargs): hero = kwargs['instance'] if hero.pk: Category.objects.filter(pk=hero.category_id).update(hero_count=F('hero_count')+1)
@receiver(pre_save, sender=Villain, dispatch_uid=”update_villain_count”)
def update_villain_count(sender, **kwargs):
villain = kwargs[‘instance’]
if villain.pk:
Category.objects.filter(pk=villain.category_id).update(villain_count=F(‘villain_count’)+1)
4.1. Signals vs Overriding .save
Since either of signals of .save can be used for the save behviour, when should you use which one? I follow a simple rule.
If your fields depend on a model you control, override .save
If your fields depend on a model from a 3rd party app, which you do no control, use signals.
How to perform truncate like operation using Django ORM?
Truncate statement in SQL is meant to empty a table for future use. Though Django doesn’t provide a builtin to truncate a table, but still similar result can be achived using delete() method. For example:
>>> Category.objects.all().count() 7 >>> Category.objects.all().delete() (7, {'entity.Category': 7}) >>> Category.objects.all().count() 0 This works, but this uses DELETE FROM ... SQL statement. If you have a large number of records, this can be quite slow. You can add a classmethod to Category if you want to enable truncate.
class Category(models.Model): # ...
@classmethod def truncate(cls): with connection.cursor() as cursor: cursor.execute('TRUNCATE TABLE "{0}" CASCADE'.format(cls._meta.db_table)) Then you can call Category.truncate() to a real database truncate.
What signals are raised by Django during object creation or update?
Django provides signals which allows hooking into a model objects creation and deletion lifecycle. The signals provided by Django are
pre_init post_init pre_save post_save pre_delete post_delete Among these, the most commonly used signals are pre_save and post_save. We will look into them in detail.
6.1. Signals vs overriding .save
Since signals can be used for similar effects as overriding .save, which one to use is a frequent source of confusion. Here is when you should use which.
If you want other people, eg. third party apps, to override or customize the object save behaviour, you should raise your own signals If you are hooking into the save behavior of an app you do not control, you should hook into the post_save or pre_save If you are customizing the save behaviour of apps you control, you should override save. Lets take an example of a UserToken model. This a class used for providing authentication and should get created whenever a User is created.
class UserToken(models.Model): token = models.CharField(max_length=64)
# ...
How to convert string to datetime and store in database?
We can convert a date-string and store it in the database using django in many ways. Few of them are discussed below. Lets say we have a date-string as “2018-03-11” we can not directly store it to our date field, so we can use some dateparser or python library for it.
> > > user = User.objects.get(id=1)
date_str = “2018-03-11”
from django.utils.dateparse import parse_date // Way 1
temp_date = parse_date(date_str)
a1 = Article(headline=”String converted to date”, pub_date=temp_date, reporter=user)
a1.save()
a1.pub_date
datetime.date(2018, 3, 11)
from datetime import datetime // Way 2
temp_date = datetime.strptime(date_str, “%Y-%m-%d”).date()
a2 = Article(headline=”String converted to date way 2”, pub_date=temp_date, reporter=user)
a2.save()
a2.pub_date
datetime.date(2018, 3, 11)