David Ziegler's personal blog of computing, math, and other heroic achievements.


25 Apr 2010

Some Common Django ORM Pitfalls

For the most part, I like the Django ORM because it makes it easy to write reusable code that reads and writes from the database. I’ve found that the ORM can be a double edged sword though, as it sometimes becomes too easy to read and write from the database. In hindsight, most of the following mistakes are pretty obvious once you understand how the ORM works, but I still see these all the time so I thought it’d be good to point them out. If you want a more basic guide to Django model and querying patterns, Better Django Models is a great article for that, so I won’t reiterate the points made in there.

For the following examples, I’ll be using these models:

class Book(models.Model):
    author = models.ForeignKey(User)
    
class Profile(models.Model):
    user = models.ForeignKey(User)

1. book.author does a database query

OK, this is pretty basic, but it has a bunch of implications, such as:

book.author.id != book.author_id

Well, the values returned will be the same, but book.author.id does an additional database query. There is pretty much never a good reason to do book.author.id unless you know for sure that you’re accessing an internally cached instance, either obtained from select_related or because you’ve already accessed book.author and created a cached instance, but even then, why chance it?

For the same reason,

this is bad

book = Book()
book.author = profile.user
book.save()

and this is good

book = Book()
book.author_id = profile.user_id
book.save()

2. Querysets are not lists

How many database queries is this?

books = Book.objects.all()
print books[0]
print books[1]

The answer is 2, one for each slice. It’s much easier to see that this is 2 separate queries once you realize that the above is essentially equivalent to

print Book.objects.all()[0]
print Book.objects.all()[1]

This result is a combination of Django’s querysets being lazy, meaning they won’t be evaluated until they’re accessed, and because a queryset’s internal cache doesn’t get populated unless you iterate through the queryset. If we do:

books = Book.objects.all()
for book in books:
    print book
print books[0]
print books[1]

This will result in one database query because by iterating through the queryset, the internal cache will get populated and books[0] and books[1] will simply access the internal cache (I don’t recommend iterating through the entire queryset if you only need the first two books, I’m just trying to make a point).


3. Use iterator() when you don’t need or want the internal queryset cache

As I just mentioned, iterating through the queryset will populate the internal cache. Sometimes though, the internal cache may not be desirable. For example if we have one million users:

users = User.objects.all()
for user in users:
    print user.username

this will load one million users into memory because users internal cache will be populated. The iterator() method will tell the queryset not to populate the internal cache, which can significantly reduce memory usage and increase performance. 

users = User.objects.all()
for user in users.iterator()
    print user.username

Even for smaller querysets, it’s not a bad idea to use the iterator() method if you know you’re not going to reuse the queryset.


4. Be careful with model properties/methods that do database lookups

class Profile(models.Model):
    
    user = models.ForeignKey(User)
    
    @property
    def username(self):
        return self.user.username

There’s nothing necessarily wrong about this, but it’s dangerous to expose properties or methods that hide database lookups. Especially if you’re working with designers who may not know what your schema looks like, exposing properties like this makes it easy to do:

{% for profile in profiles %}
    <li>{{ profile.username }}</li>
{% endfor %}

whereas it’s much easier to see that

{% for profile in profiles %}
    <li>{{ profile.user.username }}</li>
{% endfor %}

will do N User lookups. If for some reason you find that you do need to create a property that does a database lookup, make it private.

class Profile(models.Model):
    
    user = models.ForeignKey(User)
    
    @property
    def _username(self):
        return self.user.username

Private methods can’t be used in templates, so it becomes much harder for a designer to shoot your site in the foot.

Hopefully this was helpful for someone. Feel free to comment, subscribe, or follow me on twitter.

Comments (View)

blog comments powered by Disqus
Page 1 of 1