Mrinal Kamboj Mrinal Kamboj - 3 months ago 15
C# Question

Improve the time complexity of current Linq queries

I have the following lists:

RakeSnapshots, ProductMovements


Aim is to process the both and get the count of elements that match a condition, as follows:


  • Consider
    RakeSnapshots
    with
    StatusCode == "Dumping"

  • Consider
    ProductMovement
    with
    Status == "InProgress"

  • Fetch the
    count
    of all elements both lists, which meet the condition
    RakeSnapshots.RakeCode
    equal to
    ProductMovements.ProductCode



Following are my current options:

// Code 1:

var resultCount = ProductMovements.Where(x => RakeSnapshots
.Where(r => r.StatusCode == "Dumping")
.Any(y => y.RakeCode == x.ProductCode &&
x.Status == "InProgress"))
.Count();


// Code 2:

var productMovementsInprogress = ProductMovements.Where(x => x.Status == "InProgress");

var rakeSnapShotsDumping = RakeSnapshots.Where(r => r.StatusCode == "Dumping");

var resultCount = productMovementsInprogress.Zip(rakeSnapShotsDumping,(x,y) => (y.RakeCode == x.ProductCode) ? true : false)
.Where(x => x).Count();


Challenge is both the codes are
O(n^2)
complexity, is there a way to improve it, this will hurt if the data is very large

Answer

You can use an inner join to do this:

var dumpingRakeSnapshots       = rakeSnapshots.Where(r => r.StatusCode == "Dumping");
var inProgressProductMovements = productMovements.Where(p => p.Status == "InProgress");

var matches =
    from r in dumpingRakeSnapshots
    join p in inProgressProductMovements on r.RakeCode equals p.ProductCode
    select r;

int count = matches.Count(); // Here's the answer.

Note that (as Ivan Stoev points out) this only works if RakeCode is the primary key of RakeSnapshots.

If it is not, you will have to use a grouped join.

Here's the Linq query syntax version that you should use in that case, but note that this is exactly the same as Ivan's answer (only in Linq query form):

var matches =
    from r in dumpingRakeSnapshots
    join p in inProgressProductMovements on r.RakeCode equals p.ProductCode into gj
    select gj;

For completeness, here's a compilable console app that demonstrates the different results you'll get if RakeCode and ProductCode are not primary keys:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApp1
{
    class RakeSnapshot
    {
        public string StatusCode;
        public string RakeCode;
    }

    class ProductMovement
    {
        public string Status;
        public string ProductCode;
    }

    sealed class Program
    {
        void run()
        {
            var rakeSnapshots = new List<RakeSnapshot>
            {
                new RakeSnapshot {StatusCode = "Dumping", RakeCode = "1"},
                new RakeSnapshot {StatusCode = "Dumping", RakeCode = "1"},
                new RakeSnapshot {StatusCode = "Dumping", RakeCode = "2"}
            };

            var productMovements = new List<ProductMovement>
            {
                new ProductMovement {Status = "InProgress", ProductCode = "1"},
                new ProductMovement {Status = "InProgress", ProductCode = "2"},
                new ProductMovement {Status = "InProgress", ProductCode = "2"}
            };

            var dumpingRakeSnapshots       = rakeSnapshots.Where(r => r.StatusCode == "Dumping");
            var inProgressProductMovements = productMovements.Where(p => p.Status == "InProgress");

            // Inner join.

            var matches1 =
                from r in dumpingRakeSnapshots
                join p in inProgressProductMovements on r.RakeCode equals p.ProductCode
                select r;

            Console.WriteLine(matches1.Count());

            // Grouped join.

            var matches2 =
                from r in dumpingRakeSnapshots
                join p in inProgressProductMovements on r.RakeCode equals p.ProductCode into gj
                select gj;

            Console.WriteLine(matches2.Count());

            // OP's code.

            var resultCount = 
                productMovements
                .Count(x => rakeSnapshots
                .Where(r => r.StatusCode == "Dumping")
                .Any(y => y.RakeCode == x.ProductCode && x.Status == "InProgress"));

            Console.WriteLine(resultCount);
        }

        static void Main(string[] args)
        {
            new Program().run();
        }
    }
}