🔙 N+1 & Eager loading & Beyond
The N+1 problem is one of the most common issues when our applications grow. That is frequently associated with ORMs because their abstractions can hide the resulting queries executed. However, it's not limited to them since we can always manually fetch data in a manner that will culminate in that problem.
Imagine we have
Match, and we want to report the last 100 matches with their duration. In an ORM like ActiveRecord that could look like this:
Match.last(100).map do |match| Report.call(duration: match.duration) end
Now imagine that for every
Match played we have an associated
Game. So besides the match itself, we want to report the game name, for example.
Match.last(100).map do |match| Report.call(duration: match.duration, game_name: match.game.name) end
So even though
Game is in another table that is tied to
Match via the association, so Active Record will make use of lazy loading to load the required data for us. Lazy loading is handy because you can easily query related data on demand.
SELECT * FROM matches ORDER BY matches.id DESC LIMIT 100; SELECT * FROM games WHERE games.match_id = 100; SELECT * FROM games WHERE games.match_id = 99; # 98 queries later... SELECT * FROM games WHERE games.match_id = 1;
However, the problem that was introduced here is hidden in the queries level. Because we are lazy loading the games on every loop iteration that will lead to a new SQL query to the database every time. The resulting number of queries would be
1 for the matches + 100 for loading games = 101 queries —boom that's our N+1.
Eager loading is another strategy to prevent N+1. The strategy consists in loading upfront any data of interest so whenever you need to access that data it would already be available in memory.
Match.includes(:game).last(100).map do |match| Report.call(duration: match.duration, game_name: match.game.name) end
The code uses the
includes query method to indicate what relationships we need to query alongside the
Matches one, it does so by leveraging the relationship between
Games. Active Record will ensure that all of the specified associations are loaded using the minimum possible number of queries. It could do a join and or an additional SQL, but no lazy load anymore.
SELECT * FROM matches ORDER BY matches.id DESC LIMIT 100; SELECT * FROM games WHERE games.match_id IN (100..1);
Preventing lazy loading 🙅♂️
One alternative is to stop working with Models and transforming them into hashes, or other in-memory only entities. Having that, we can safely operate without worrying about dispatching queries, but the trade-off is that we lose that rich API and we need to re-expose any functionality attached to the Model.
Another alternative is to use tools that help us to track and identify the N+1 issues early in the process. Packages like bullet alert us when we should add eager loading to queries or when you should cache, for example. We the tools work for us but we are still reactive to alerts (on prod or dev environments).
Rails 6.1 was released with strict loading and that introduces an optional strict mode for models to prevent lazy load!
class Match < ApplicationRecord # attrs... has_one: :game, strict_loading: true end class Game < ApplicationRecord # attrs... end Match.first.game.name # => ActiveRecord::StrictLoadingViolationError Exception: Match is marked as strict_loading and Game cannot be lazily loaded.
We can set up the
strict_loading at the application level, the model level or association level. If we ever try to lazy load the association we will get an error 🎉 . That is great because we are proactive and we cannot create lazy queries in the first place!
Pros and Cons
The clear benefit is that we avoid flooding the data source with individual queries for every relationship inside the loop. That reduces the risk of the calls, the load in the data sources and generally ends up being more performant.
The caveat is that these single queries to all records and its relationships are not much more expensive than the ones we run on every loop iteration. So if you need to load everything at some point, then eager loading should be adequate.
Of course, that comes with a memory cost to load everything upfront. And ultimately if you in our code you end up not using all the data queried that you're loading then you might be wasting memory and slowing down some queries.
Beyond relational Eager Loading 🚀
Sometimes we need to load data that is not explicitly related in the database, that can be fetched from other data source (APIs, cache, DBs, ...) so we cannot leverage frameworks’ features like the one from
ActiveRecord. However, it is possible to implement eager loading by ourselves or use other patterns that avoid N+1 in similar manners, like data-loaders.
If your process is very data-intensive or is not a web request-response you might need to look into further alternatives. We could consider caching, denormalising the data, preprocessing data, batch vs streaming processing, etc. These are all big topics that deserve much more elaboration than this post aims to do. If you are interested in these topics, I recommend Designing Data-Intensive applications which cover these topics in a great manner.
In summary, whatever implementation the principle is still the same: use abstractions to query all necessary data with the minimum amount of queries possible.
I might explore more of these topics later, but for now, this is it, see you later 👋
Let me know what you think about this post on twitter!