Crowcoder Crowcoder - 19 days ago 6
C# Question

Explain "unpredictable behavior" of enumerable when mutating property during foreach

Refer to the LinqPad script below.

While implementing a workflow, I grab the next set of

HasRun
tasks from a collection (IEnumerable). While iterating over the result set of the Linq query I change the task to
HasRun = true
. Debugging shows I initially get the expected subset of four objects, however, after the subset has all been marked, the enumerable suddenly resolves to the next subset and the foreach loop continues over that set as well, and then the next, etc.

So when I expect to iterate four times, it keeps going until all three subsets (9 items) have been iterated.
This is easily solved by
.ToList()
ing the enumerable but I wonder if this is intentional behavior.

In Googling I found references to "unpredicatable behavior" of iteration variables, this old post being one example, on which @jon skeet commented on, but the latest c# spec (sec. 8.8.4) makes no mention of unpredicable behavior, it simply mentions problems with assignment, increment and decrement:


A compile-time error occurs if the embedded statement attempts to modify the iteration variable (via assignment or the ++ and operators) or pass the iteration variable as a ref or out parameter.


Is this behavior by design?

void Main()
{
List<Foo> foos = new List<UserQuery.Foo>
{
new Foo{ SetNbr = 1, HasRun = false },
new Foo{ SetNbr = 1, HasRun = false },
new Foo{ SetNbr = 1, HasRun = false },
new Foo{ SetNbr = 1, HasRun = false },
new Foo{ SetNbr = 2, HasRun = false },
new Foo{ SetNbr = 2, HasRun = false },
new Foo{ SetNbr = 3, HasRun = false },
new Foo{ SetNbr = 3, HasRun = false },
new Foo{ SetNbr = 3, HasRun = false }
};

//Grab the first subset of Foos where HasRun is false, in order of SetNbr
var firstNotRunGroup = foos.Where(a => a.SetNbr == (foos.Where(f => f.HasRun == false).Min(f => f.SetNbr)));

foreach (Foo foo in firstNotRunGroup)
{
//foo = new Foo(); < Fails, as expected
foo.HasRun = true;
Console.WriteLine(foo.SetNbr);
}
}

class Foo
{
public int SetNbr { get; set; }
public bool HasRun { get; set; }
}


Output:

1
1
1
1
2
2
3
3
3

Answer

You have to remember that LINQ operations return queries not the results of executing queries. Every time you iterate over a LINQ sequence it re-computes the results of that query at that time. This means that if your query is based on some underlying collection or data store (in this case, you're using a List) and that data changes, subsequent iterations of the query, will reflect those changes.

On top of that, LINQ queries, to the best of their abilities, defer as much computation as possible during iteration; they only compute as much as they need to provide the next value. This means that changes to the underlying data store during enumeration can affect computations involving how the rest of the query is computed.

So, what does your code do. First you declare a query, firstNotRunGroup that doesn't actually do anything.

Then we start iterating over firstNotRunGroup in the foreach. It executes the predicate with a being the first item in the list. a.SetNbr is 1. Then we query over foos looking for the lowest set number of an item not run. That'll be 1, it's a match, so the first item is returned. We then set HasRun to true for that item and print it out.

Now the foreach goes to check if the second item is a match. The foos is queried again and the lowest set number with an item not run is 1, that's a match for the second item, so it runs it. This happens two more times.

Now the first four items in the list have all been run, and the foreach is now going to check if the fifth item in the list should be returned. The SetNbr is 2, and when it iterates over foo to see what the smallest set number of a not run item is it sees that all of the items from the first set have been run, so 2 is the smallest set number of an item not yet run. 2 matches the set number of the item we're querying, so it should be run.

As you can see from these two patterns, every item in the set will end up running. There are any number of things that could change this; if the list doesn't have the items in ascending order of set number your whole thing breaks (in different ways, depending on how the list is ordered), if you compute the smallest set with an item not yet run once, rather than recomputing that value for every single item in the set this wouldn't happen (and you're code would also not scale so horribly) or, as you said, if you computed the entire set of items in the first group not run before you started running the items, then you wouldn't get this result.

Comments