reddal reddal - 1 month ago 5
C# Question

Which collection class to use for a rolling period timeseries of data?

I want to imlpement a c# class (.NET 4.6.1) which contains time series data as follows :


  • the timeseries is a collection keyed on DateTime, each with an associated value (eg an array of doubles)

  • the points will be added strictly in time order

  • there will be a rolling time period - e.g. 1 hour - when adding a new point, any points at the start older than this period will be removed

  • the key issues for performance will be quickly adding new points, and finding the data point for a particular time (a binary search or better is essential). There will be 100k's of points in the collection sometimes.

  • it has to be thread safe

  • it has to be relatively memory efficient - e.g. it can't just keep all the points from the beginning of time - as time moves on the memory footprint has to be fairly stable.



So what would be a good approach for that in terms of underlying collection classes? Using Lists will be slow - as the rolling period means we will need to remove from the start of the List a lot. A Queue or LinkedList would solve that - but I don't think they provide a fast way to access the nth item in the collection (ElementAt just iterates so is very slow).

The only solution I can think of will involve storing the data twice - once in a collection thats easy to remove from the start of - and again in one thats easy to search in (with some awful background process to prune the stale points from the search collection somehow).

Is there a better way to approach this?

Thanks.

Answer

When I first saw the question I immediately thought of a queue, but most built-in queues do not efficiently allow indexed access, as you've found.

Best suggestion I can come up with is to use a ConcurrentDictionary. Thread-safe, near-constant access time by key, you can key directly to DateTimes, etc. it's everything you need functionally, except the behavior to manage size/timeline. To do that, you use a little Linq to scan the Keys property for keys more than one hour older than the newest one being added, then TryRemove() each one (you can derive from ConcurrentDictionary and override TryAdd() to do this automatically when adding anything new).

Only other potential issue is it's not terribly memory-efficient; the HashSet-based implementation of .NET Dictionaries requires a two-dimensional array for storage based on the hash of the key, and that array will be sparsely populated even with 100k items in the collection.

Comments