πŸ”™ N+1 & Eager loading & Beyond

The N+1 problem is one of the most common issues when our applications grow. That is frequently associated with ORMs because their abstractions can hide the resulting queries executed. However, it's not limited to them since you can always manually write code and SQLs that will culminate in that problem.

Imagine you have Matches, and you want to list 100 matches with their dates and duration on your API or UI. In an ORM like ActiveRecord that could look like this:

Matches.limit(100).map do |match|
  puts "Match duration: #{match.duration}"
end

Now imagine that for every Match played we have an associated Game. So besides the match itself, we want to return the game name, for example.

Matches.limit(100).map do |match|
  puts "Match duration: #{match.duration} | Game: #{match.game.name}"
end

So even though Games are in another table that is tied to Matches via an association, so Active Record will make use of lazy loading to take care of it and load the required data for us. Lazy loading is handy because you can easily query related data on demand.

However, the problem that was introduced here is hidden in the queries level. Because we are lazy loading the games on every loop iteration that will lead to a new SQL query to the database every time. The resulting number of queries would be 1 for the matches + 100 for loading games = 101 queries β€”boom that's what we call N+1.

SELECT * FROM matches LIMIT 100;
SELECT * FROM games WHERE games.match_id = 1;
# 99 queries later...
SELECT * FROM games WHERE games.match_id = 100;

Preventing lazy loading πŸ™…β€β™‚οΈ

There are some ways we could prevent lazy loading from happening. In Rails we could use the to_a after the query to we convert the Active Record Collection to an Array, so you would query immediately, and would not be able to edit the initial query. However, that would not solve the lazy querying of the relations as we would still work with models.

Another alternative is stop working with Models and transforming them into hashes, or other in memory only entities. Having that, you can safely operate without worrying about dispatching queries, but the trade off is that you lose that rich API.

The most promising alternative, in my opinion, is detailed in the Rails 6.1 release notes, that will introduce an optional strict mode for models to prevent N+1. Keep an eye on it!

Strict loading associationsΒ allows you to ensure that all your associations are loaded eagerly and stop N+1's before they happen.


Eager Loading

Eager loading is another strategy to prevent N+1. The strategy consists in loading upfront any data of interest so whenever you need to access that data it would already be available in memory.

Matches.incudes(:games).limit(100).map do |match|
  puts "Match duration: #{match.duration} | Game: #{match.game.name}"
end

The code uses the includes query method to indicate what relationships we need to query alongside the Matches one, it does so by leveraging the relationship between Matches and Games. Active Record will ensure that all of the specified associations are loaded using the minimum possible number of queries. It could do a join and or an additional SQL, but no lazy load anymore.

SELECT * FROM matches LIMIT 100;
SELECT * FROM games WHERE games.match_id IN (1...100);

Pros and Cons

The clear benefit is that we avoid flooding the data source with individual queries for every relationship inside the loop. That reduces the risk of the calls, the load in the data sources and generally ends up being more performant.

The caveat is that these single queries to all records and its relationships are not much more expensive than the ones we run on every loop iteration. So if you need to load everything at some point, then eager loading should be adequate.

Of course, that comes with a memory cost to load everything upfront. And ultimately if you in our code you end up not using all the data queried that you're loading then you might be wasting memory and slowing down some queries.


https://media.giphy.com/media/yns3VgsP30GDm/giphy.gif

Beyond relational Eager Loading πŸš€

That solves a lot of problems, especially for web requests, but sometimes we need to load data that is not explicitly related in the database, or that is fetched from other data source (APIs, cache, DBs, ...) so we cannot leverage frameworks’ features like the one from ActiveRecord. However, it is possible to implement eager loading by ourselves or use other patterns that avoid N+1 in similar manners, like data-loaders.

If your process is very data-intensive or is not a web request-response you might need to look into further alternatives. We could consider caching, denormalising the data, preprocessing data, batch vs streaming processing, etc. These are all big topics that deserve much more elaboration than this post aims to do. If you are interested in these topics, I recommend Designing Data-Intensive applications which cover these topics in a great manner.

In summary, whatever implementation the principle is still the same: use abstractions to query all necessary data with the minimum amount of queries possible.

I might explore more of these topics later, but for now, this is it, see you later πŸ‘‹

Let me know what you think about this post on twitter!

peaonunes Β© 2021, now built with Gatsby