I recently did quite a bit of work on a project that uses Redis for its primary method of data storage. Like many developers, I’ve used Redis before for simple key/value retrieval, but not for much else. What I ended up learning is that Redis works phenomenally for a few very specific functionalities, one of which is very low-cost sorting of data sets. This works really great for leaderboard-type implementations, which is what I used it for on Treehouse. Ultimately, we ended up with a starting Sorted Set including several dozens of thousands of records, based on points and badges that Treehouse students earn while learning.

If we were to have implemented a leaderboard without Redis, we would have run into many other pitfalls, limitations, and shortcomings, like having to recalculate the leaderboard at periodic intervals, possibly not having the leaderboard be truly “realtime,” or doing expensive lookups using Active Record–none of these scenarios are ideal. But a lot of people don’t realize that Redis can operate on sorted sets or what exactly it’s capable of doing, so I thought I’d do a brief walkthrough. Note that I am a developer, not a mathematician or a statistician. Although I do enjoy both of those topics a bit more than I ever thought I would, I am not formally educated in either topic, just a geek with a strong passion!

Also, I use Redis’s Ruby library for managing all of this, but I’ll keep most of this walkthrough open-ended by only using non-language-specific input and commands sent directly to the Redis client, which you can find in Redis’ documentation. I may expound here and there with Ruby to show how you can further manipulate results, but understand that you likely can do all the same manipulation using the language of your choosing.

What exactly is a Sorted Set?

A sorted set is a non-repeating collection of unique Strings that is associated with a score. For example, I may have a Sorted Set with two members/records in it: one for “bob”, who has a score of 30 and one for “jane”, who has a score of 90. Once “jane” is added to the Sorted Set, she cannot be added again, only updated or removed–she is a unique member of the set. All write operations on the Sorted Set are performed at O(log n), which is pretty good.

For the sake of this walkthrough, let’s pretend we’re building a scored leaderboard, keyed by username, because that’s the most common use case for this data type. You might also key this on numeric user ID as well, but for the sake of making everything easier to follow, I’m using alphanumeric usernames instead in this example.

Basic Operations

Most of the basic add and update operations on Sorted Sets with Redis are fairly similar to other data types, with a few small caveats.

Adding a new entry

127.0.0.1:6379 > ZADD "leaderboard" 30 "aimee"

Let’s break this down. “leaderboard” is the name of the key we are storing the leaderboard on. You could call this anything you want, but if you are working within an application that uses Redis for other tasks, it’s probably wise to make it semantically logical, so we’re sticking with “leaderboard” for simplicity. 30 is the score that we are assigning to “aimee”. If “aimee” is already included in the set, this command will update the existing entry and reinsert it based on its revised score and recalculated position within the Sorted Set.

ZADD is great if you are adding the entry for the first time with their initial score. If you are updating an existing score, it’s wiser to use ZINCRBY to save yourself from having to do costly lookups on your database, calculating the user’s total score. The idea here is that once your Redis data store has the user’s correct score, it should be in sync with your database and the score can be simply incremented for better performance.

Incrementing score

127.0.0.1:6379 > ZINCRBY "leaderboard" 5 "aimee"

Like ZADD, we have almost the same parameters accepted here, except that the numeric value passed in is a value that will be added to the existing score. If “aimee” is not added yet, a new record will be added that increments from a base score of 0, making her total score 5. If “aimee” were to have an existing score of 30, she would have a score of 35 after this command finished.

Both of these operations are fairly simple and shouldn’t be too mind-boggling. If they are for any reason, the easiest way to understand is to open up the Redis client (redis-cli) and play around! It’ll be a blast, I promise.

Getting a ranked list

Okay, here’s where the fun starts. Having just one member in my sorted set isn’t going to be that interesting, so let’s add a few more people!

127.0.0.1:6379 > ZADD "leaderboard" 10 "homer"
127.0.0.1:6379 > ZADD "leaderboard" 25 "marge"
127.0.0.1:6379 > ZADD "leaderboard" 55 "bart"
127.0.0.1:6379 > ZADD "leaderboard" 70 "lisa"
127.0.0.1:6379 > ZADD "leaderboard" 5 "maggie"
127.0.0.1:6379 > ZADD "leaderboard" 80 "frank grimes"

There’s multiple ways you can sort this list. Most people like to sort leaderboards by score, ranking highest first, so let’s try that out first.

127.0.0.1:6379 > ZREVRANGE "leaderboard" 0 10

This asks Redis to fetch scores starting from the 0th position. The 10 denotes the number of rows we want to fetch for our leaderboard.

1) "frank grimes"
2) "lisa"
3) "bart"
4) "marge"
5) "homer"
6) "aimee"
7) "maggie"

Well, that’s great, but it doesn’t give us scores… and we kind of want that, right? Let’s try this again.

127.0.0.1:6379 > ZREVRANGE "leaderboard" 0 10 WITHSCORES
1) "frank grimes"
2) "80"
3) "lisa"
4) "70"
5) "bart"
6) "55"
7) "marge"
8) "25"
9) "homer"
10) "10"
11) "aimee"
12) "5"
13) "maggie"
14) "5"

Ah, there we go. Much better. But this is kind of a useless format of results, right? To make this easier to work with, I can manipulate it into a Hash in Ruby:

results = Hash[*redis.zrevrange("leaderboard", 0, 10, {withscores: true})]

And the results are much nicer to work with:

=> {"frank grimes"=>"80",
 "lisa"=>"70",
 "bart"=>"55",
 "marge"=>"25",
 "homer"=>"10",
 "aimee"=>"5",
 "maggie"=>"5"}

The opposite of ZREVRANGE, if you were curious, is ZRANGE, which sorts from low to high score. All sort and lookup operations on Sorted Sets perform at O(log(n) + M) complexity — where M is the number of records fetched, which again, is really good.

Some other sorting methods

As of Redis 2.8.9, there are a few other ways to sort results–I haven’t really found much use of them but I’m sure they’re useful to some:

ZRANGEBYLEX / ZREVRANGEBYLEX – sorts lexically/alphabetically by member name rather than score

127.0.0.1:6379>  ZREVRANGEBYLEX "leaderboard" 0 10

 1) "aimee"
 2) "bart"
 3) "frank grimes"
 4) "homer"
 5) "lisa"
 6) "maggie"
 7) "marge"

ZRANGEBYSCORE/ZREVRANGEBYSCORE – very similar to ZRANGE/ZREVRANGE, the only difference here is that the second and third parameters are no longer offset and limit but range boundaries for which scores you want to return.

So, if I were to run the following command:

127.0.0.1:6379> ZRANGEBYSCORE "leaderboard" 10 30

This would return all members who scored between 10 and 30:

 1) "aimee"
 2) "maggie"
 3) "homer"
 4) "marge"

There is a small caveat to this. If you use ZREVRANGEBYSCORE instead, you will need to swap the order of your min and max scores or else you will end up with an empty set 🙂

This is actually really useful if you wanted to do some advanced statistics. You could theoretically get the mean or median score, plus or minus standard deviation and compare outside data on members who have scored within that set to members who are clear outliers if you wanted to get a better understanding of your user base. Or, if you maintained several leaderboards, you can compare numbers between the different boards. There are several additional Sorted Set methods that help with things like this and I’ll get to them shortly.

Individual Ranking

Occasionally, you want to know the ranking of an individual member in a Sorted Set. This is pretty simply done:

127.0.0.1:6379> ZRANK "leaderboard" "frank grimes"
(integer) 6

But wait. Frank Grimes has the highest score… that’s ranked in ascending score order. We probably care about their rank from highest score, right?

127.0.0.1:6379> ZREVRANK "leaderboard" "frank grimes"
(integer) 0

That’s better. Note that the rank returned is always the indexed position of the member, not their cardinal order. If you are using this functionality in the context of an application, you might want to use a decorator or helper to display correct rank. If you try to perform either of these methods on a non-existent member, it will return a nil value.

Scores

And, of course, you can fetch an individual member’s score at any time using simply:

127.0.0.1:6379> ZSCORE "leaderboard" "frank grimes"
"80"

There’s not much to that one! Like, ZRANK, it’ll return nil if the member being queried doesn’t exist in the set.

The More Complicated Stuff

This is the stuff I don’t see getting mentioned a whole lot around the web, and really, it’s the most fascinating bits of Sorted Sets to me. One of the most gruesome parts of building a leaderboard, for me, was accepting the fact that you can’t really query, filter, or add scoping to the results. They are, for all intents and purposes, equivalent to an immutable Hash, so if you want your results to be filtered by “only members who fall under these particular rules and logic that Redis is agnostic to,” you’re going to have to do some additional lifting on your end.

For example, what if I only want to see a leaderboard of members who are between the ages of 18 and 49? The Sorted Set contains no information on how old each member is. And I don’t want to maintain a separate Redis Sorted Set for just these users, because that’s kind of redundant, right? You probably don’t want to fetch every single one of the potentially hundreds of thousands of records and filter individually… that’s not very efficient! There’s a couple of ways you could approach this with Redis. One is by way of Lua scripting with use of EVAL (which runs Lua) and ZSCAN (which can be used for matching keys and values by rules, similar to regular expressions).

Another way, which I found to be interesting, is by use of ZINTERSTORE or ZUNIONSTORE.ZINTERSTORE will find the intersection of two Sorted Sets and form a new Sorted Set of the results.ZUNIONSTORE will essentially merge two existing Sorted Sets together.

ZINTERSTORE in use:

127.0.0.1:6379> ZINTERSTORE "new_leaderboard" 10 "leaderboard" "temp_users_set"

Okay, that looks kind of confusing, I think!

So let’s look at this in application with Ruby:

def filter_leaderboard_by_username(usernames = [])
  redis.multi do
    user_ids.each do |user_id|
      redis.zadd("temp_users_set", 0, user_id)
    end
  end

  redis.zinterstore("new_leaderboard", ["leaderboard", "temp_users_set"])
  output = redis.zrevrange("new_leaderboard", 0, -1)

  redis.del("temp_users_set")
  redis.del("new_leaderboard")
  output           
end

We’d probably want to refactor this a little bit to work better in a high-traffic production environment, but this is a pretty quick way to filter from a significantly larger Sorted Set to pick off the members we want to see. I’m actually using a really cool feature of Redis here called multi, which queues commands to be executed atomically, which is kind of handy.

So, in the example above, we build a temporary Sorted Set (which we tear down immediately after processing it) containing the usernames passed in. We don’t really care about the scores on this set because the intersection will be weighted in favor of the larger board. There are options to change the weights here if you were doing something more complicated with this method–all you need to do is pass 2 additional numeric weight parameters for each Sorted Set. The above method will return the intersection of the two boards, which should be a filtered version of the larger board containing only members whose usernames were passed to the method.

Interstore has a bit more complexity than other Redis methods, operating at O(NK)+O(Mlog(M)), with N being the size of the smaller Sorted Set, M being the number of sets (you can actually use more than 2), and K being the size of the resulting set. But this likely is still faster than alternative approaches outside of Redis, especially as the size of your Sorted Set grows.

I’m still somewhat new to some of these methods and still have times where I have to RTFM, but I have found a lot of people are not even aware that some of this stuff exists, and would love to know how others are using this technology, especially for scenarios that deviate from my own use case!