Sep 1 2009

A WCF Tale of Fail

Category: AlexRobson @ 13:34

Sometimes I think a better title for my blog would be Behind The Learning Curve… but that’s a topic for another time.

Recently we started seeing rolling WCF channel request timeout exceptions in our production environment. It seemed the only way to get the services back in working order was to cycle the app pool the services were hosted in. You don’t have to be Juval Lowy to realize that this points to a misuse of WCF. But where exactly? Finding the issue was going to be challenging since this particular project has been a case study in feature-creep and has had a constantly changing team (we used a lot of consultants for temporary staffing) let’s just say there’s a significant amount of technical debt in the form of inconsistently applied architectural approaches and implementations.

A large part of this is entirely my fault. I didn’t take time to make sure every team member on the project was familiar with the proper way to close/abort/dispose the proxy. Not only that but I have been wrong (twice now) regarding how to make absolutely sure you’re not going to create memory leaks by not correctly handling the proxy when you’re finished using it.Each time my understanding of the pattern improved, I would update the team, update the code I’d written and try to make more helpers available to handle the issue.

My first mistake was thinking that all I had to do was close the proxy in a finally block, like this:

using (var service = new ServiceClientProxy())
{
    try
    {
        service.SomeCall();
    }
    catch
    {
        //handle the exception
    }
    finally
    {
        service.Close();
    }
}

It wasn’t long until I discovered on my own that this is very, very wrong. If for any reason the channel faults, you’ll get another exception in your finally block for trying to close a faulted channel. You have to call abort on the proxy once the state becomes faulted. So I updated my code and the team to let them know about that finding and (foolishly) made the assumption all would be well after updating my helper classes and direct uses of the clients as above to look like the pattern below:

using (var service = new ServiceClientProxy())
{
    try
    {
        service.SomeCall();    
    }    
    catch    
    {
        //handle the exception    
    }    
    finally    
    {        
        if(service.State != CommunicationState.Faulted)
        {
            service.Close();
        }
        else
        {
            service.Abort();
        }
    }
}

But as it turns out, that’s still not a correct/complete pattern because it still doesn’t address the fact that when you call Close on the service proxy, you can get an exception. (thanks Evan Hoff for pointing this one out to me recently) So the correct pattern is actually:

using (var service = new ServiceClientProxy())
{
    try
    {
        service.SomeCall();
    }
    catch
    {
        //handle the exception
    }
    finally
    {
        if(service.State != CommunicationState.Faulted)
        {
            try
            {
                service.Close();
            }
            catch
            {
                service.Abort();
            }
        }
        else
        {
            service.Abort();
        }
    }
}

Now, there are a lot of different ways to keep from repeating this finally block all over the place. For a temporary fix, I introduced the following extension method (because I’m lazy and shameless)

public static class ServiceHelperExtensions
{
    public static void DisposeCorrectly(this ICommunicationObject wcfProxy)
    {
        if(wcfProxy.State != CommunicationState.Faulted)
        {
            try
            {
                service.Close();
            }
            catch
            {
                service.Abort();
            }
        }
        else
        {
            service.Abort();
        }
    }
}

And while this was a quick way for me to find all uses of the service client proxies and introduce the following code very quickly (thanks to ReSharper)

using(var service = new ServiceClientProxy())
{
    try
    {
        service.Call();
    }
    finally
    {
        service.DisposeCorrectly();
    }
}

In the future I don’t plan on using this approach long term and instead have been working on a dynamic WCF proxy class so that we can kiss the generated proxies goodbye forever (or so I hope).

Why admit to all these pretty careless mistakes and effectively award myself oodles of doofus points? Well, for one, no one person on the team was using the correct pattern on their own. That’s not because we have a bad team. I actually believe we have a really great team. And if a team of good developers all make the same mistakes, chances are there are plenty developers out there who are making similar mistakes regarding the proper handling of the proxy and maybe they’ll stumble across this blog and realize “Hey, I’m not alone, Alex was making this mistake way before I was!”.

And of course, I’m also taking my lumps so I can pass some out. To the group of disconnected ne’er-do-wells at Microsoft who are responsible for the no-ass implementation of IDisposable in WCF:

Dear WCF Team,

Why do you hate me so much? I mean, so much of WCF is actually pretty flipping sweet. That whole two-way binding thing you guys did, that’s awesome! My buddy Jim is really excited about that. And making it possible to get the binding and endpoint from a meta data exchange on the service is brilliant. But why in the name of sweet baby Bill Gates didn’t you handle this mess of crap in your IDisposable implementation? Things that implement IDisposable (especially things in the FRAMEWORK) should in fact be disposed of and not create memory leaks if I don’t perform the secret hand-shake of try/if/try/catch/else/finally.

You crapped on my code and broke my heart WCF team.

Sincerely,
Alex

Tags:

Comments

1.
Jim Cowart Jim Cowart United States says:

I think the extension method was a nice way to "tidy up the pain" of the "try/if/try/catch/else/finally-vulcan-nerve-pinch".  Still psyched about the two-way binding though....client and server will be able to nerve pinch each other...should get interesting. Smile

2.
Jarrod Marshall Jarrod Marshall United States says:

My understanding has always been that you are forced to abort the channel on a fault because the channel is always unusable if its a server side fault.

The channel still hangs around and does not dispose allowing you to gather any information on the channel that may be needed (like reopening a new channel and retrying the connection).

The big thing here - at least to me - is that the proxy hides most of the work with channels. I prefer to work with the channel factory and channel directly myself. Its much more lightweight  since I am only dealing with interfaces and I maintain control of when my channels are opened, closed, aborted and their lifetime.

It's also much easier to implement your own connection management approach if you want to pool your wcf connections. I know the newer WCF generated proxies do this to an extent - but if you're dealing with TCP bindings and load balancing it can become more complex.

Also - if you're using duplex binding in IIS - watch your web garden settings. Your established callback connections can vanish if you drop your app domain - in that case your service loses track of who it needs to call back to.

@jarrod268

3.
pingback topsy.com says:

Pingback from topsy.com

Twitter Trackbacks for
        
        A WCF Tale of Fail
        [sharplearningcurve.com]
        on Topsy.com

4.
AlexRobson AlexRobson United States says:

@Jarod,

I really appreciate your feedback! I remember from talking to you before that you have done far more with WCF than anyone else I've met. I like your approach of dealing with the channel factory directly and I've actually got a dynamic WCF proxy "library" in the works (very simple, but it seems to work nicely).

My primary gripe is the re-purposing of the IDisposable interface; if interfaces define contracts for behavior, and a class implements that contract, I expect the implementation to be somewhat predictable or at least consistent in regards to the result. There are other patterns the team could have used to make the fault information available to the consumer without resorting to leaving a faulted channel open/in memory.

Anyway, hopefully others who were unclear about this behavior will stumble on this blog and realize there's more to it than they expected. I think it's great that you've provided your experience and approach here for others to find and think about or implement themselves.

Add comment


(Will show your Gravatar icon)

  Country flag

biuquote
  • Comment
  • Preview
Loading