Thursday, March 27, 2008

System.ComponentModel.Win32Exception is NOT my friend

We've been trying to track down a really sneaky problem for months. Every now and then, at seemingly random times, our application crashes. Very aggravating for the users, especially when they have a patient in the CT scanner when this happens.

Aggravating also for us, as the stack traces are very light on detail. And the detail they do provide is alarmingly down in the bowels.

There are two frequent offenders, both of them System.ComponentModel.Win32Exceptions. "The operation completed successfully" and "Not enough storage is available to process this command".

System.ComponentModel.Win32Exception: The operation completed successfully
at System.Windows.Forms.DibGraphicsBufferManager.CreateCompatibleDIB(IntPtr hdc, IntPtr hpal, Int32 ulWidth, Int32 ulHeight, IntPtr& ppvBits)
at System.Windows.Forms.DibGraphicsBufferManager.CreateBuffer(IntPtr src, Int32 offsetX, Int32 offsetY, Int32 width, Int32 height)
at System.Windows.Forms.DibGraphicsBufferManager.AllocBuffer(Graphics targetGraphics, IntPtr targetDC, Rectangle targetBounds)
at System.Windows.Forms.DibGraphicsBufferManager.AllocBufferInTempManager(Graphics targetGraphics, IntPtr targetDC, Rectangle targetBounds)
at System.Windows.Forms.DibGraphicsBufferManager.AllocBuffer(IntPtr target, Rectangle targetBounds)
at System.Windows.Forms.Control.WmPaint(Message& m)
at System.Windows.Forms.Control.WndProc(Message& m)
at System.Windows.Forms.ScrollableControl.WndProc(Message& m)
at System.Windows.Forms.ContainerControl.WndProc(Message& m)
at System.Windows.Forms.UserControl.WndProc(Message& m)
at System.Windows.Forms.ControlNativeWindow.OnMessage(Message& m)
at System.Windows.Forms.ControlNativeWindow.WndProc(Message& m)
at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)



Exception: System.ComponentModel.Win32Exception
Message: Not enough storage is available to process this command
Source: System.Windows.Forms
at System.Windows.Forms.DibGraphicsBufferManager.CreateCompatibleDIB(IntPtr hdc, IntPtr hpal, Int32 ulWidth, Int32 ulHeight, IntPtr& ppvBits)
at System.Windows.Forms.DibGraphicsBufferManager.CreateBuffer(IntPtr src, Int32 offsetX, Int32 offsetY, Int32 width, Int32 height)
at System.Windows.Forms.DibGraphicsBufferManager.AllocBuffer(Graphics targetGraphics, IntPtr targetDC, Rectangle targetBounds)
at System.Windows.Forms.DibGraphicsBufferManager.AllocBufferInTempManager(Graphics targetGraphics, IntPtr targetDC, Rectangle targetBounds)
at System.Windows.Forms.DibGraphicsBufferManager.AllocBuffer(IntPtr target, Rectangle targetBounds)
at System.Windows.Forms.Control.WmPaint(Message& m)
at System.Windows.Forms.Control.WndProc(Message& m)
at System.Windows.Forms.ScrollableControl.WndProc(Message& m)
at System.Windows.Forms.ContainerControl.WndProc(Message& m)
at System.Windows.Forms.UserControl.WndProc(Message& m)
at System.Windows.Forms.ControlNativeWindow.OnMessage(Message& m)
at System.Windows.Forms.ControlNativeWindow.WndProc(Message& m)
at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)

Leaving aside for a moment the question of why "the operation completed successfully" should crash an application, let's carry on with the investigation.

We grabbed some crash dumps and sent them off to Microsoft for analysis. Microsoft's recomendation for a solution was to turn off double buffering. We tried to disable double buffering, and the performance was atrocious. There was a lot of flickering and the painting was slow and incomplete in some areas. Not a good solution.

Microsoft then told us to try setting a smaller font on a Label control. Also not a good solution, since it didn't work, and didn't really look that good.

One day, while trolling the bug bucket looking for something to do, I found an issue about disposing Graphics objects. That's a good thing to do, I figured, so I dug into the code and looked for CreateGraphics() calls. To my dismay, there were quite a few that were not being disposed, or were not in a using() block.

I added the calls to Dispose(), or put the objects in a using() block, and went on my way.

A couple of weeks later, one of our developers stopped by my desk to ask about a strange thing. He had been working on some automated testing of the application, and he noticed that all of a sudden, it stopped crashing. He had been getting lots of System.ComponentModel.Win32Exceptions after running his automated tests, and in the past few builds, they were gone.

Now, the jury is still out on whether or not the disposing of the Graphics objects fixed these crashes, but it certainly looks that way. The updated software should be installed in the next day or two, and then we should know for sure.

Call it a lucky accident that I made those changes, but it definitely makes me look smarter than I really am. I won't tell anyone if you won't.

15 comments:

  1. Huh. My company's having this problem too. If you resolve it, give me a hoot.

    ReplyDelete
  2. How about, if YOU resolve it, you give ME a hoot. :)

    First one to hoot gets bragging rights.

    ReplyDelete
  3. alan farquhar9:24 am

    Huh, my company is having this too! If either of you guys resolve this, then you're hired.

    ReplyDelete
  4. Can I use you as a reference?

    ReplyDelete
  5. Brendan8:22 am

    My company was having this same issue. Our problem was that we were not properly disposing of User Controls. Our app is set up as one parent container that calls child User Controls. When we were finished with one of the User Controls we were setting the reference to null. According to a post that I read a process cannot have more than 10,000 User Objects. Since we were not disposing of the User Controls the User Object count increased until the 10,000 mark was hit. So we changed our code to call dispose on the User Control and that fixed the issue.

    ReplyDelete
  6. Brendan, I think that part of our problem was similar. We've used the Scitech memory profiler (http://scitech.se/) to track down the controls that we've forgotten to dispose. It's been a huge help.

    ReplyDelete
  7. I recently had the same issue and luckily my management were more than happy for me to discuss the issue with Microsofts Windows Forms team.

    This is the real issue behind the error:

    The exception is thrown when the graphics objects are told to draw at a size of (0, X) or (X, 0) or (0, 0).

    The reason this occurs is usually because the WM_SIZE windows message is not received.

    See this blog for details

    So when the WM_SIZE is not received your controls try to re-draw themselves but will potentially pass in zeros into the drawing methods. This error was occuring for us when we restored a minimized application into view.

    Microsoft said "this happens with nested user controls within forms if the Kernel stack exceeds the stack size"

    The fix:

    protected override void OnSizeChanged(EventArgs e)
    {
    if (this.Handle != null)
    {
    this.BeginInvoke(new EventHandler(HandleOnSizeChanged), this, e);
    }
    }

    private void HandleOnSizeChanged(object sender, EventArgs e)
    {
    base.OnSizeChanged(e);
    }

    By using BeginInvoke you effectively stop the "nesting" from taking you too far, so that what happens is it might draw 2 layers of nesting, but begin invoke places the child controls back onto the queue to be drawn and places them at an earlier point on the stack.

    Stack Without Stack With Fix
    a a c e
    b b d f
    c
    d
    e
    f

    It is up to you where you place this code, I placed it within some user controls and also within a base form class so that every form in our app inherited from this.

    Note: If you do not get the error anymore but you do get some screen re-draw issues then you need to apply the fix again to the parent object of the control with re-draw issues.

    When talking to the Microsoft guy he showed me how to use Spy++ to test out and see each control receiving the WM_SIZE message in the processes controls messages window, this will help you track it down also. Microsoft does have both a 32Bit Spy++ and a 64Bit Spy++ available for a separate download so that the machine you're having the issue on doesn't have to be your development machine.

    I won't wish you luck as hopefully this will resolve your problems.

    Regards,
    Pete

    ReplyDelete
  8. apologies it didn't format correctly:

    The stack example should be:

    Without Fix

    a
    b
    c
    d
    e
    f


    With Fix
    a c e
    b d f

    ReplyDelete
  9. Thanks Peter. I'll look into this.

    The blog post you mentioned indicates that this is only an issue on x64 - we were definitely seeing it on x86 machines.

    We did have some success by reducing the memory fragmentation in our app, and by removing some top-level windows. That may have helped to reduce the frequency of this crash.

    ReplyDelete
  10. Anonymous9:43 am

    I see this in a very basic Winform server app running on a x64 architecture (although the app is a .NET 3.5 app targeting x86). There are no graphics or anything like that, but there are a few user controls.

    It happens once in a while when you either connect to or disconnect from the server using MSTSC (terminal services)

    ReplyDelete
  11. Yes the link does show x64 but the principle is the same.

    If the kernel stack grows to reach it's maximum limit, usually through control nesting then the windows messages do not get cascaded down to the bottom of the nesting.

    It just occurs quicker on x64 as the kernel stack is the same as x86 but I would assume this is because the pointer size is larger.

    Disposing of unused controls / out of scope controls would help this issue as they would be removed from the kernel stack.

    ReplyDelete
  12. In my case today my Win32Exception ('not enough storage') was provoked by an OutOfMemoryException split seconds before that I had not caught. So don't rule out that something entirely unrelated provoked it.

    ReplyDelete
  13. The "not enough storage" is really the notice you get that you've run out of memory. The "why" you ran out of memory is the hard part to this.

    In our case an unmanaged C++ dll was fragmenting memory quite a bit. When the GDI+ double buffering tried to allocate enough memory to do the offscreen paint, it couldn't get a big enough chunk.

    ReplyDelete
  14. Mark and Peter, it looks like i am suffering the same problem with this nested loops of large images.. my application is having unmanage C++ code which generates dll and UI is based on VB.NET,
    do you have any suggestion like this to put on C++ side or VB side so that my stack calls stop loading the stacks...
    I am stuck with this problem for long time and really seeking a solution for this.
    Thanks in advance for help..

    ReplyDelete
  15. Anonymous5:51 am

    i am getting the error when a .net 2.0 app is moved to 4.0. The 2.0 app works fine but when the project properties in VS is changed, this error starts coming.

    ReplyDelete