Raghav Raghav - 16 days ago 7
C# Question

Performance issue while deserializing large collection of user defined object from Redis via protobufnet

Issue: Facing slow performance while deserializing the bytes received from Redis.

I am using REDIS for distributing caching in my ASP.NET web application.

In order to converse with Redis from my application I am using StackExchange.Redis.

In order to serialize/deserialize the bytes received to/from server from/to DTO I am using protobuf-net

My Goal is to store a dictionary of 100,000 users (Dictionary (int, User)) into Redis and retrieve it back multiple times on a single request.

That dictionary will reside under MyContext.Current.Users property. Key of that dictionary is the user id and the value is the complete dto. The issue I have right now is that it is taking 1.5-2 seconds to deserialize the list 100,000 users from bytes (Redis gives me bytes). I have to use that property multiple times in my request.

public Dictionary<int, User> Users
{
get
{
// Get users from Redis cache.
// Save it in Redis cache if it is not there before and then get it.
}
}


Users is the property exposed in my context wrapper class.

Here is the DTO I am having for User (This DTO is having more than 100 properties):

[ProtoContract]
public class User
{
[ProtoMember(1)]
public string UserName { get; set; }

[ProtoMember(2)]
public string UserID { get; set; }

[ProtoMember(3)]
public string FirstName { get; set; }

.
.
.
.
}


Here is the snippet of the code I am using to talk with Redis with the help of StackExchange.Redis:

At the time of storage - Converting my DTO to bytes so that it can be stored into Redis:


db.StringSet(cacheKey, bytes, slidingExpiration)


command:

private byte[] ObjectToByteArrayFromProtoBuff(Object obj)
{
if (obj == null)
{
return null;
}

using (MemoryStream ms = new MemoryStream())
{
Serializer.Serialize(ms, obj);
return ms.ToArray();
}
}


At the time of fetching - Converting bytes to DTO, bytes received from


db.StringGet(cacheKey);


command:

private T ByteArrayToObjectFromProtoBuff<T>(byte[] arrBytes)
{
if (arrBytes != null)
{
using (MemoryStream ms = new MemoryStream(arrBytes))
{
var obj = Serializer.Deserialize<T>(ms);
return obj;
}
}
return default(T);
}


Here is the screenshot of ANTS Performance Profiler showing the time taken by the protobuf-net to deserialize that 100,000 users from the bytes which Redis is giving.

enter image description here

As you can see the average time taken to deserialize the bytes into Dictionary of users (Dictionary Users) is around 1.5 to 2 seconds which is too much since I am using that property at so many places to fetch user information from that dictionary.

Can you let me know what I am doing wrong here?

Is that good every time to deserialize the 100,000 users list from Redis into the application and then use it? (Each request further have to deserialize where ever Users property is being used to process the request).

Is it correct to store the dictionary/collection/list of users or any other large collection into Redis in bytes and then get it back via deserializing every time we have to use it?

According to following post Does Stack Exchange use caching and if so, how?
I got to know that StackExchange is heavily using Redis. I believe my 100,000 users are far less and its size (around 60-80 MB) is far far less than what StackExchange and other sites (FB etc.) are having. How come StackOverflow is deserializing such a big list of users/top questions and many other items (which are in cache) so fast?

Can't I use a dictionary of 100,000 users with the DTO (with each item in that list having more than 100 properties) under cache and deserialize it multiple times in single request or every request?

I have no issue with that list/dictionary when I use HttpRuntime.Cache as the cache provider but when I switch to Redis the deserialization part is causing the hinderence as it is still slow.

I would like to add one more detail into this post. Previously I was using BinaryFormatter to deserialize that list and it was almost 10 times slower than protobufnet which I am using right now. But still, even with protobufnet it is taking 1.5 to 2 seconds on an average to deserialize those users from bytes which is still slow since that property has to be used many times in the code.

Answer

Yes, if you try to transfer a large collection of many objects, you will always have to pay bandwidth + deserialization costs for the entire graph. The key here is: not to do that. Fetching a list of 100,000 users multiple times per request seems entirely unnecessary and very much a performance bottleneck.

There are two common approaches:

  • work with the large object (the Dictionary<,>), but fetch it only very occasionally - as in: in the background, every 5 minutes, or if you know it has changed via pub/sub
  • work with just the discreet objects you need per request, and leave the rest at the redis server; only fetch it at most once per request

Either approach is fine, and which you prefer may depend on things like your request rate vs the data change rate, and how up to date you require the data to be. For the second approach, for example, you could consider using a redis hash, where the key is much like you are using now, the hash-slot key is the int (or some string / binary representation there-of), and the hash-slot value is the serialized form of the single DyveUser instance. The advantage of using a hash here (as opposed to per-user strings) is that you can still fetch / purge / etc all users at once via the redis hash commands (hgetall, for example). All of the required hash operations are available in SE.Redis with the Hash* prefix.