I was using the Rails cache and Redis and I quickly overflew the memory storage so I went on a small quest to better understand the Rails cache implementation. I thought it worths writing a bit about it.
TLDR; use your cache client directly or pass the raw option as true to the Rails.cache methods.
Rails provides a comprehensive and easy to use interface for caching, that is the Cache-Store. It provides a common interface to any of the standard cache implementations that Rails provides out of the box, from the in-memory cache to file, Memcached and Redis.
The cache implementation is very convenient because that allows us to store from HTML
partials to Models
and complex classes. The best part is that it abstracts the whole serialization so you always end up with workable entities without needing to worry about a thing.
> game = Game.last
=> #<Game id: 1, name: "Pokemon", created_at: "2021-01-14 12:10:59.872271000 +0000", updated_at: "2021-01-14 12:10:59.872271000 +0000">
> Rails.cache.write('pokemon', game)
=> "OK"
> pokemon = Rails.cache.read('pokemon')
=> #<Game id: 1, name: "Pokemon", created_at: "2021-01-14 12:10:59.872271000 +0000", updated_at: "2021-01-14 12:10:59.872271000 +0000">
> pokemon.name
=> "Pokemon"
In the example above we load a record from the Games
table then we cache that entity using the Rails.cache.write
method. When retrieving the cache entry with its key we end up with the same model class we were using before, and we can even call its methods and attributes as expected. That's super cool, isn't it!? But how does Rails do it?
# https://github.com/rails/rails/blob/291a3d2ef29a3842d1156ada7526f4ee60dd2b59/activesupport/lib/active_support/cache.rb#L598-L600
def serialize_entry(entry)
@coder.dump(entry)
end
The answer is in the snippet above from the cache-store implementation, and what the @coder
instance holds, it holds an instance of the Marshal
library.
The marshaling library converts collections of Ruby objects into a byte stream, allowing them to be stored outside the currently active script. This data may subsequently be read and the original objects reconstituted.
Before reading or writing any record the cache-store will serialize the entry by default, and it will use the Marshal
library to do so. In that way, the magic is done for us and we can read and write any Ruby object 🥳!
Simple objects storage cost
Let’s now set this learning for a moment and analyze another example. Imagine we want to store a boolean.
> Rails.cache.write('yes', true)
=> "OK"
> Rails.cache.fetch('yes')
=> true
Rails is able to store it and retrieve without any issues.
That said, we would expect the value stored in the cache to be stringified version of the boolean, right? To confirm that let’s connect directly to the storage and inspect the values there.
— In my case, I’m using Redis as the cache so I just instantiate a new instance of its client to connect directly to it.
After getting the yes
value it is clear than we have much more than “true”.
> redis = Redis.new
=> #<Redis client v4.1.4 for redis://127.0.0.0:6379/0>
> redis.get('yes')
=> "\\u0004\\bo: ActiveSupport::Cache::Entry\\t:\\v@valueT:\\r@version0:\\u0010@created_atf\\u00161609929749.567886:\\u0010@expires_at0"
What ends up being stored is the serialized version of an ActiveSupport::Cache::Entry instance. The Entry class is an abstraction that implements expiration, compression and versioning of any cache record. Through this class, Rails can implement these features independently from the actual storage used behind it.
The cache entry class encapsulates whatever value we store in the cache by default. Leveraging the Marshal
lib the Rails cache is capable of storing any simple/complex object while offering the cache features. That is great!
In our previous example, the serialized version of the cache entry is a String of 100 chars
instead to of a 4 chars
String — true. That is an extra 96 chars
for storing the same information.
While for the most cases that is totally fine, what if you really need to care about the amount of the stored data?
To understand the impact of these extra chars let’s elaborate more on our example.
short detour: Redis is implemented in C and it probably needs a few extra bytes to maintain our String value which is an array of chars underneath. But let’s not consider it since that’s the same extra bytes to all String values.
Knowing we need 1B
to store 1 char
, in C, we can conclude we would need 100B
to store the serialized version of Entry cache store.
Now for 1 million
records with the value true we would need 100MB
(1M * 100B). This example is “simple” and 100MB
may not sound a lot but if you need to store a little bit more than a boolean, if you are using the in-memory store, or if you have limited space in Redis
that can start hurting.
The Alternatives
The direct alternative I could think about was to use the Redis
client directly instead of using the Rails.cache
abstraction.
> redis.set('no', false)
=> "OK"
> redis.get('no')
=> "false"
It should work as expected and we are no longer utilizing the extra space for that value 🙌🏽. We are left then with the job to parse that object back to a boolean value.
Another alternative that I found after looking at the Redis cache store implementation on GitHub was to pass down the raw
option.
> Rails.cache.write('yes', true, raw: true)
=> "OK"
> redis.get("yes")
=> "true"
> Rails.cache.read('yes', raw: true)
=> "true"
This option is only mentioned in the Memcached
part of the docs, but that is at least also supported on Redis
cache store implementation as it overrides the default serialize_entry
method [1]. Similar to utilizing the Redis client directly we will need to parse the resulting string back to a boolean manually. Even though we lose the Entry
features that is not a big deal if you are using Redis
or Memcached
since they provide most of these features out of the box.
Conclusions
Thanks a lot if you got this far!
The level of caution that this post brings to the usage of Rails cache is, most of the times, not required. However, if you ever want to cache millions of simple objects knowing some of these details can make a difference!
See you next time!
Let me know what you think about this post on twitter!