Deterministic Finalization and IDisposable Part 1: The Basics

This is part 1/5 of my Deterministic Finalization and IDisposable post series.

This topic has been covered many times by many others (such as here and here), so if you are familiar with C#’s using statement and IDisposable interface, feel free to skip this post. I’m writing this introduction to provide the necessary background information to set up a series of subsequent posts.

Garbage collection, found in languages such as C# and Java (among many others), is a very useful feature: it largely alleviates the need for a programmer to manually handle resource management. The most commonly cited benefit is that garbage collection eliminates the need for the programmer to explicitly call heap memory management functions such as malloc and free; instead, the garbage collector automatically keeps track of whether objects are still in use and frees them when they are no longer needed.1 However, in addition to handling memory management, garbage collection may also release other scarce resources upon cleanup, such as file locks or network connections.

An important to point to note about most (all?) garbage collectors is that they are nondeterministic. This means that, in general, a programmer does not and should not know when the actual garbage collection phase happens.2 In other words, a program could stop using an object but its underlying memory may not be freed for seconds, minutes, hours, days, or possibly ever. Usually this is a good thing; it can often be a large performance boost.

However, as I mentioned above, garbage collection manages more than just memory. Consider what happens when you call .NET’s File.Open() method, which returns a FileStream object with which you can read and write bytes to the file. Unless explicitly specified otherwise, the FileStream will create an exclusive lock on the underlying file; no other process (or thread) will be able to open the file for reading or writing while the FileStream is open. Usually this isn’t much of a problem, as once the process has ended the file will be closed and most processes are short-lived.

Consider, if you will, the case where the process isn’t short-lived. Perhaps the process opened up the file and wrote to it without explicitly closing it, expecting the garbage collector to eventually notice that the process was done with the file and to close it, releasing the lock. However, as the garbage collector is nondeterministic, we simply don’t know when — if ever — the garbage collector will close the file, and the process will keep a lock on the file for potentially a very long time.4

Another way to illustrate the above problem is to consider the following C# code which first writes to a file and then immediately reopens the file to read from it; the code as shown is virtually guaranteed to fail.

1
2
3
4
5
6
7
8


string filename = ...;
FileStream writeStream = File.Open(filename, FileMode.Create, FileAccess.Write);
writeStream.Write(...);

// The following line is virtually guaranteed to throw an Exception as
// it cannot open the file because writeStream will not have been garbage
// collected yet.
FileStream readStream = File.Open(filename, FileMode.Open, FileAccess.Read);

Now, many developers will say “That’s easy to solve. Just call the FileStream.Close() method when you are done with the FileStream.” (A few may say call GC.Collect() but that’s a bad idea3) OK, fine, let’s add the Close() to the above code:

1
2
3
4


string filename = ...;
FileStream writeStream = File.Open(filename, FileMode.Create, FileAccess.Write);
writeStream.Write(...);
writeStream.Close();

In the above code, what happens if writeStream.Write() throws an exception which is caught and handled at a higher level? That’s right — Close() is never called and once again you are dependent on the whims of the garbage collector to clean up the file.5

One common solution to the above problem is to wrap the code using a try {} finally {} block. For example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


string filename = ...;
FileStream writeStream = null;
try
{
    writeStream = File.Open(filename, FileMode.Create, FileAccess.Write);
    writeStream.Write(...);
}
finally
{
    if (writeStream != null)
        writeStream.Close();
}

The C# developers, being pretty bright people, recognized that the above situation is actually fairly common — that in addition to garbage collection’s nondeterministic finalization, programs also often need a form of deterministic finalization to free scarce resources as soon as possible. To this end, they invented two concepts: the IDisposable interface and the using statement.

The IDisposable interface contains exactly one method: Dispose(). It is nothing but a cleanup method which uses a slightly more generic name than Close(). Many diverse objects implement IDisposable, from AsymmetricAlgorithm to Image to SqlConnection. A list of direct implementers of IDisposable in the .NET Class Library is here, but please note that it doesn’t include classes which indirectly implement IDisposable by having a parent (or grandparent, or great-grandparent…) class which is a direct implementer.

The using statement is basically nothing but syntactic sugar, as

1
2
3
4


using (FileStream fs = File.Open(filename, FileMode.Create, FileAccess.Write))
{
    ... do work with fs
}

… is more-or-less short for

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


FileStream fs = null;
try
{
    fs = File.Open(filename, FileMode.Create, FileAccess.Write);
    ... do work with fs
}
finally
{
    if (fs != null)
    {
        ((IDisposable) fs).Dispose();
    }
}

The cast in the code fragment ((IDisposable) fs).Dispose(); is necessary because it is possible in C# to implement interface methods which are only exposed via that particular interface and not by the implementing class (see here). In other words, the following code won’t compile:

1
2
3
4
5
6
7


class A : IDisposable
{
    void IDisposable.Dispose() { ... }
}

A a = new A();
a.Dispose();

… whereas if you replace a.Dispose() with ((IDisposable) a).Dispose(); it will. This was likely added to allow a class to implement two separate interfaces which have a method with an identical name and signature.

People familiar with C++ may note, as Herb Sutter did, that using and IDisposable are little but a more verbose (and perhaps uglier) form of a C++ destructor. Furthermore, since a C++ destructor is automatically executed (whether upon block exit for stack-based objects or upon delete for heap-based objects), whereas Dispose() must be explicitly invoked, one is much less likely to forget to call a C++ destructor (i.e. essentially never unless one leaks memory). This is important because it is usually bad to forget to call Dispose() for any objects which implement IDisposable once you are done with them. (By the way, Anders Hejlsberg, I wouldn’t mind a construct in C# which provides for automatically calling Dispose() at block-end; it would help eliminate using’s verbosity.)

In my upcoming posts, I will discuss some guidelines for writing classes which implement IDisposable and then describe and demonstrate some useful classes which I have written that implement IDisposable.

Footnotes

If you are interested as to how the .NET garbage collector works, read the article Garbage Collector Basics and Performance Hints on MSDN.
Savvy readers may be aware that many garbage collected languages provide a way for the programmer to force (more like strongly suggest) that a garbage collection happen at this instant — such as .NET’s GC.Collect() method3.
Extremely savvy readers may be aware that in general calling the GC.Collect() method is a bad idea.
File locking isn’t the only reason to worry about nondeterministic finalization of FileStream objects. Another concern is the fact that FileStream performs buffering, and the data won’t be flushed unless Flush(), Close(), or Dispose() is called. Therefore, if you were to open up a file for writing with the permissive FileShare.Read flag (which probably isn’t a good idea in most cases), there’s a high probability that readers will see incomplete data until the aforementioned functions are called (either explicitly or through a form of deterministic finalization).
I used the example of file locking because it is close to heart. At a previous job I had to deal with the problem of a coworker inadvertently holding onto locks in perpetuity in a daemon process quite a few times. I presume the problem related to not closing the file when exceptions were thrown (otherwise it would have happened more often). Unfortunately the code was apparently poorly designed or not understood and the program was not fixed; instead the solution was to reboot the machine. Yow.