🔙 How to store raw values with Rails.cache

1/25/2021

I was using the Rails cache and Redis and I quickly overflew the memory storage so I went on a small quest to better understand the Rails cache implementation. I thought it worths writing a bit about it.

TLDR; use your cache client directly or pass the raw option as true to the Rails.cache methods.

Rails provides a comprehensive and easy to use interface for caching, that is the Cache-Store. It provides a common interface to any of the standard cache implementations that Rails provides out of the box, from the in-memory cache to file, Memcached and Redis.

The cache implementation is very convenient because that allows us to store from HTML partials to Models and complex classes. The best part is that it abstracts the whole serialization so you always end up with workable entities without needing to worry about a thing.

> game = Game.last
 => #<Game id: 1, name: "Pokemon", created_at: "2021-01-14 12:10:59.872271000 +0000", updated_at: "2021-01-14 12:10:59.872271000 +0000">
> Rails.cache.write('pokemon', game)
 => "OK"
> pokemon = Rails.cache.read('pokemon')
 => #<Game id: 1, name: "Pokemon", created_at: "2021-01-14 12:10:59.872271000 +0000", updated_at: "2021-01-14 12:10:59.872271000 +0000">
> pokemon.name
 => "Pokemon"

In the example above we load a record from the Games table then we cache that entity using the Rails.cache.write method. When retrieving the cache entry with its key we end up with the same model class we were using before, and we can even call its methods and attributes as expected. That's super cool, isn't it!? But how does Rails do it?

# https://github.com/rails/rails/blob/291a3d2ef29a3842d1156ada7526f4ee60dd2b59/activesupport/lib/active_support/cache.rb#L598-L600
def serialize_entry(entry)
  @coder.dump(entry)
end

The answer is in the snippet above from the cache-store implementation, and what the @coder instance holds, it holds an instance of the Marshal library.

The marshaling library converts collections of Ruby objects into a byte stream, allowing them to be stored outside the currently active script. This data may subsequently be read and the original objects reconstituted.

Before reading or writing any record the cache-store will serialize the entry by default, and it will use the Marshal library to do so. In that way, the magic is done for us and we can read and write any Ruby object 🥳!

Simple objects storage cost

Let’s now set this learning for a moment and analyze another example. Imagine we want to store a boolean.

> Rails.cache.write('yes', true)
 => "OK"
> Rails.cache.fetch('yes')
 => true

Rails is able to store it and retrieve without any issues.

That said, we would expect the value stored in the cache to be stringified version of the boolean, right? To confirm that let’s connect directly to the storage and inspect the values there.

— In my case, I’m using Redis as the cache so I just instantiate a new instance of its client to connect directly to it.

After getting the yes value it is clear than we have much more than “true”.

> redis = Redis.new
 => #<Redis client v4.1.4 for redis://127.0.0.0:6379/0>
> redis.get('yes')
 => "\\u0004\\bo: ActiveSupport::Cache::Entry\\t:\\v@valueT:\\r@version0:\\u0010@created_atf\\u00161609929749.567886:\\u0010@expires_at0"

What ends up being stored is the serialized version of an ActiveSupport::Cache::Entry instance. The Entry class is an abstraction that implements expiration, compression and versioning of any cache record. Through this class, Rails can implement these features independently from the actual storage used behind it.

The cache entry class encapsulates whatever value we store in the cache by default. Leveraging the Marshal lib the Rails cache is capable of storing any simple/complex object while offering the cache features. That is great!

In our previous example, the serialized version of the cache entry is a String of 100 chars instead to of a 4 chars String — true. That is an extra 96 chars for storing the same information.

While for the most cases that is totally fine, what if you really need to care about the amount of the stored data?

To understand the impact of these extra chars let’s elaborate more on our example.

short detour: Redis is implemented in C and it probably needs a few extra bytes to maintain our String value which is an array of chars underneath. But let’s not consider it since that’s the same extra bytes to all String values.

Knowing we need 1B to store 1 char, in C, we can conclude we would need 100B to store the serialized version of Entry cache store.

Now for 1 million records with the value true we would need 100MB (1M * 100B). This example is “simple” and 100MB may not sound a lot but if you need to store a little bit more than a boolean, if you are using the in-memory store, or if you have limited space in Redis that can start hurting.

The Alternatives

The direct alternative I could think about was to use the Redis client directly instead of using the Rails.cache abstraction.

> redis.set('no', false)
 => "OK"
> redis.get('no')
 => "false"

It should work as expected and we are no longer utilizing the extra space for that value 🙌🏽. We are left then with the job to parse that object back to a boolean value.

Another alternative that I found after looking at the Redis cache store implementation on GitHub was to pass down the raw option.

> Rails.cache.write('yes', true, raw: true)
 => "OK"
> redis.get("yes")
 => "true"
> Rails.cache.read('yes', raw: true)
 => "true"

This option is only mentioned in the Memcached part of the docs, but that is at least also supported on Redis cache store implementation as it overrides the default serialize_entry method [1]. Similar to utilizing the Redis client directly we will need to parse the resulting string back to a boolean manually. Even though we lose the Entry features that is not a big deal if you are using Redis or Memcached since they provide most of these features out of the box.

Conclusions

Thanks a lot if you got this far!

The level of caution that this post brings to the usage of Rails cache is, most of the times, not required. However, if you ever want to cache millions of simple objects knowing some of these details can make a difference!

See you next time!

Let me know what you think about this post on twitter!