Maintaining IEnumerable-Yield Data Pipeline is Hard

27

Oct

Maintaining IEnumerable-Yield Data Pipeline is Hard

C# and .Net had IEnumerable and yield since its early versions. And I think its one of the most misunderstood features of C#. This is because of the pattern of deferred/lazy execution implementation.  Before we delve deep into this let us understand the iterator model.

IEnumerable and IEnumerator:

This is the foundation of the C# Iterator pattern and there are a number of detailed articles about this by Eric Lippert. The simple example is here.

There is a basket full of Oranges.

 
public class Basket
{
    private readonly IEnumerable Oranges;

    public Basket(IEnumerable oranges)
    {
        Oranges = oranges;
    }

    public IEnumerable GetOranges()
    {
        return Oranges;
    }
}

public class Orange
{
    public int NumberOfSlices { get; set; }

    public void Peal()
    {
            
    }

    public void Consume()
    {
            
    }
}

And if we have to consume oranges, we will have to peel each one of them. So we can use a simple iterator pattern on them.

var basket = new Basket(new [] { new Orange(), new Orange(), new Orange() });
foreach (var orange in basket.GetOranges())
{
    orange.Peal();
    orange.Consume();
}

But the problem is, we need not consume all the oranges from the basket at one go (I am not Hungry ;-).
So how can I get one orange from the basket at a time? or whenever I am Hungry?

Yield:

C# Yield keyword provides an answer to this problem. Slightly changing the Basket class using Yield keyword. (signature of the GetOranges Method not changed)

public class Basket
{
    private readonly Orange[] _oranges;

    public Basket(Orange[] oranges)
    {
        _oranges = oranges;
    }

    public IEnumerable GetOranges()
    {
        for (int i = 0; i < _oranges.Length; i++)
        {
            yield return _oranges[i];
        }
    }
}

Now the basket will provide one Orange at a time. ie when MoveNext() Method of the IEnumerator is called. In other words, we have paused execution of the “for” loop in GetOranges method till the time the next Orange is required.

Or in more technical words, until you Enumerate, the data will not be fetched and once Enumerated, there is no data. It all boils down to “when you Enumerate”.

The Maintenance Problem:

This is tricky for developers who are inheriting an existing code base. A developer might accidentally add an Enumeration before the place where it actually has to. The next time or When the original Enumeration happens, it can not find any new data and the iterator just completes without any looping. This can happen very silently without any exceptions. So care to be taken to see where we are yielding and where we are Enumerating.

“Yield return” is a great feature in C#, but use it with caution.  Let me know your thoughts.

4 thoughts on - Maintaining IEnumerable-Yield Data Pipeline is Hard

  • Srinivas Paila
    Reply Oct 28, 2015 at 7:23 am

    Awesome… I liked the way you explained with oranges example.

    • srkshanky
      Reply Oct 28, 2015 at 6:04 pm

      Thanks Paila.

  • Seshu
    Reply Nov 15, 2015 at 6:41 am

    Nice article shankar.

  • Pingback: Delay your execution using IEnumerable | Ace Infoway

Leave a Reply

Your email address will not be published. Required fields are marked *