Exploring the .NET CoreFX Part 8: NullReferenceException Performance Tricks

This is part 8 of my Exploring the .NET CoreFX Series.

The .NET Core’s System.Collections.Immutable.ImmutableArray class implements an immutable wrapper around a normal C# managed array. This looks something like:

public struct ImmutableArray<T>
{
    internal T[] array;
    ...
}

ImmutableArray.array is lazy-initialized.

Within the ImmutableArray class, there are a number of methods which have the precondition that ImmutableArray.array must be initialized. These preconditions must be checked before the method begins processing to make sure we handle invalid states correctly.

One example is ImmutableArray.IndexOf, which could have been implemented as:

public struct ImmutableArray<T>
{
    internal T[] array;

    public int IndexOf(T item)
    {
        if (this.array == null)
            throw new NullReferenceException(); // Or maybe InvalidOperationException?
        ...
    }
}

However, the .NET Core team was more clever than that. They realized that they could avoid a (likely easily-predicted) branch and instead implemented it as follows:

public struct ImmutableArray<T>
{
    internal T[] array;

    public int IndexOf(T item)
    {
        this.ThrowNullRefIfNotInitialized();
        ...
    }

    /// <summary>
    /// Throws a null reference exception if the array field is null.
    /// </summary>
    internal void ThrowNullRefIfNotInitialized()
    {
        // Force NullReferenceException if array is null by touching its Length.
        // This way of checking has a nice property of requiring very little code
        // and not having any conditions/branches. 
        // In a faulting scenario we are relying on hardware to generate the fault.
        // And in the non-faulting scenario (most common) the check is virtually free since 
        // if we are going to do anything with the array, we will need Length anyways
        // so touching it, and potentially causing a cache miss, is not going to be an 
        // extra expense.
        var unused = this.array.Length;

        // This line is a workaround for a bug in C# compiler
        // The code in this line will not be emitted, but it will prevent incorrect 
        // optimizing away of "Length" call above in Release builds.
        // TODO: remove the workaround when building with Roslyn which does not have this bug.
        var unused2 = unused;
    }
}

Personally I’d be deathly afraid that a more-clever compiler would remove this code entirely.  I would not implement this pattern unless I had excellent automated unit test coverage for this scenario.

Recommendations

  1. While it may be possible to avoid if (x == null) checks with clever member dereferences, don’t do it; it’s too risky.
Advertisements

Exploring the .NET CoreFX Part 7: Reference Versus Structural Equality

This is part 7 of my Exploring the .NET CoreFX Series.

In the previous post, I referenced EqualityComparer.Default. If T does not implement IEquatable, EqualityComparer.Default will use the framework-defined Object.Equals(), which implements reference equality.

However, many times you want to compare two types for structural equality (i.e. identical content) rather than reference equality (i.e. two references point to the same instance of the class). The interface IStructuralEquatable was defined to allow a class to explicitly implement structural, rather than reference equality. Related classes include IStructuralComparable and StructuralComparisons.

IStructuralEquatable.Equals() also accepts a user-provided IEqualityComparer which will be used to compare the object’s member variables for equality.

Here’s some sample code which demonstrates its use:

// A comparer that considers double.NaN != double.NaN
public class NanComparer : IEqualityComparer
{
   public new bool Equals(object x, object y)
   {
      if (x is double)
         return (double) x == (double) y;
      else 
         return EqualityComparer<object>.Default.Equals(x, y);
   }

   public int GetHashCode(object obj)
   {
      return EqualityComparer<object>.Default.GetHashCode(obj);
   }
}

// C#'s Array implements IStructualEquatable but does not implement IEquatable
double[] array1 = { double.NaN, 1.0, 2.0 };
double[] array2 = { double.NaN, 1.0, 2.0 };

// Compare the arrays for equality using Object.Equals() (reference equality).
Console.WriteLine(array1.Equals(array2)); // outputs false

IStructuralEquatable equ = array1;

// Call IStructuralEquatable.Equals using default comparer.
// EqualityComparer<object>.Default.Equals considers double.NaN to
// be equal to itself.
Console.WriteLine(equ.Equals(array2,
    EqualityComparer<object>.Default)); // outputs true

// Call IStructuralEquatable.Equals using
// StructuralComparisons.StructuralEqualityComparer.  This falls back
// to EqualityComparer<object>.Default.Equals.
Console.WriteLine(equ.Equals(array2,
    StructuralComparisons.StructuralEqualityComparer)); // outputs true

// Call IStructuralEquatable.Equals using NanComparer.
Console.WriteLine(equ.Equals(array2,
    new NanComparer())); // outputs false because NaN != NaN

The .NET Core’s ImmutableArray class implements IStructuralEquatable:

namespace System.Collections.Immutable
{
    /// <summary>
    /// A readonly array with O(1) indexable lookup time.
    /// </summary>
    /// <typeparam name="T">The type of element stored by the array.</typeparam>
    [DebuggerDisplay("{DebuggerDisplay,nq}")]
    public partial struct ImmutableArray<T> : IReadOnlyList<T>, IList<T>,
        IEquatable<ImmutableArray<T>>, IImmutableList<T>, IList,
        IImmutableArray, IStructuralComparable, IStructuralEquatable
    {
        ...
    }
}

It is unclear to me why this is the only collection in System.Collections.Immutable to implement IStructuralEquatable.

Recommendations

  1. If a collection implements IStructuralEquatable, use IStructuralEquatable.Equals() to test for structural equality. Use StructuralComparisons.StructuralEqualityComparer for simple structural equality, or a custom IEqualityComparer otherwise.
  2. If a collection implements IStructuralComparable, use IStructuralComparable.CompareTo() to perform a structural comparison. Use StructuralComparisons.StructuralComparer for simple structural comparisons, or a custom IComparer otherwise.
  3. Consider implementing IStructuralComparable and IStructuralEquatable on custom collections.

Exploring the .NET CoreFX Part 6: Use IEquatable for Higher-Performance Equals()

This is part 6 of my Exploring the .NET CoreFX Series.

Let’s say you are writing a custom IList which contains the following code:

public class MyList<T> : IList<T>
{
    private T[] array;

    ...

    public int IndexOf(T item)
    {
        for (int i = 0; i != array.Length; ++i) {
            if (this.array[i].Equals(item)) {
                return i;
            }
        }

        return -1;
    }
}

The above code uses T‘s implementation of Object.Equals(), which is defined as:

public virtual bool Equals(Object obj);

If T is a value type, it will be automatically boxed by the compiler, which has a slight performance cost. However, if you knew that T implemented IEquatable, then you could avoid the boxing entirely. For example, this code would be slightly better performing than the above for value types:

public class MyList<T> : IList<T> where T : IEquatable<T>
{
    // Everything else is the same as above
}

However, it is inadvisable to require MyList to only contain objects which implement IEquatable.

You can get the best of both worlds by using EqualityComparer.Default to choose IEquatable.Equals() when available, and Object.Equals() otherwise, as follows:

public class MyList<T> : IList<T>
{
    private T[] array;

    ...

    public int IndexOf(T item)
    {
        return IndexOf(item, EqualityComparer<T>.Default);
    }

    public int IndexOf(T item, IEqualityComparer<T> equalityComparer)
    {
        for (int i = 0; i != array.Length; ++i) {
            if (equalityComparer.Equals(this.array[i], item)) {
                return i;
            }
        }

        return -1;
    }
}

Recommendations

  1. Implement IEquatable on value types for which you expect .Equals() to be called a lot, especially those you put into arrays, lists, or hash tables.

Exploring the .NET CoreFX Part 5: Keep Indexers Trivial to Allow JIT Optimization

This is part 5 of my Exploring the .NET CoreFX Series.

This is a simple recommendation based on observations from System.Collections.Immutable.

Recommendations

  1. Keep the implementation of an indexer as trivial as possible to allow the JIT optimization of removing array bounds checking to work. For example, don’t check if a member variable is null; just use it and allow the NullReferenceException to happen naturally.

    In other words, use:

    public T this[int index]
    {
        get
        {
            return this.array[index];
        }
    }
    

    not:

    public T this[int index]
    {
        get
        {
            if (this.array == null)
                throw new NullReferenceException();
            return this.array[index];
        }
    }
    

Exploring the .NET CoreFX Part 4: The Requires Convenience Class

This is part 4 of my Exploring the .NET CoreFX Series. The System.Collections.Immutable project in the .NET CoreFX includes a convenience class called Requires, which looks like:

internal static class Requires
{
    [DebuggerStepThrough]
    public static T NotNull([ValidatedNotNull]T value, string parameterName)
        where T : class // ensures value-types aren't passed to a null checking method
    {
        if (value == null)
        {
            throw new ArgumentNullException(parameterName);
        }

        return value;
    }

    ...
}

This allows other methods to write code like:

public void Foo(string bar)
{
    Requires.NotNull(bar);

    ...
}

Exploring the .NET CoreFX Part 3: Making Methods Debugger-Friendly

This is part 3 of my Exploring the .NET CoreFX Series.

System.Collections.Immutable uses a number of attributes to make it more debugger-friendly. Here are the key attributes:

DebuggerStepThrough

Occasionally a method is so simple that it doesn’t make sense to have the debugger step into it. The System.Diagnostics.DebuggerStepThroughAttribute instructs the debugger to step through the code instead of stepping into the code.

Here is an example from System.Collections.Immutable:

internal static class Requires
{
    [DebuggerStepThrough]
    public static void Range(bool condition, string parameterName, string message = null)
    {
        if (!condition)
        {
            FailRange(parameterName, message);
        }
    }
}

DebuggerBrowsable

The System.Diagnostics.DebuggerBrowsableAttribute determines if and how a member is displayed in the debugger variable windows.

Here are a few examples from System.Collections.Immutable:

public partial struct ImmutableArray<T>: IReadOnlyList<T>, IList<T>, IEquatable<ImmutableArray<T>>, IImmutableList<T>, IList, IImmutableArray, IStructuralComparable, IStructuralEquatable
{
    ...

    [DebuggerBrowsable(DebuggerBrowsableState.RootHidden)]
    internal T[] array;

    ...

    [DebuggerBrowsable(DebuggerBrowsableState.Never)]
    bool ICollection<T>.IsReadOnly
    {
        get { return true; }
    }

DebuggerDisplay

The System.Diagnostics.DebuggerDisplayAttribute controls how a class or field is displayed in the debugger variable windows. For example, if you were writing a class to represent a complex number, you could use the DebuggerDisplay attribute to render the complex number as (a, b) or even a + bi.

Here are a few examples from System.Collections.Immutable:

[DebuggerDisplay("Count = {stack != null ? stack.Count : 0}")]
internal static class AllocFreeConcurrentStack<T>
{
    ...
}
[DebuggerDisplay("{DebuggerDisplay,nq}")]
public partial struct ImmutableArray<T> : IReadOnlyList<T>, IList<T>, IEquatable<ImmutableArray<T>>, IImmutableList<T>, IList, IImmutableArray, IStructuralComparable, IStructuralEquatable
{
    ...
}

DebuggerTypeProxy

The System.Diagnostics.DebuggerTypeProxyAttribute allows the developer to specify a display proxy for a type, allowing the developer to completely tailor the view for the type.

Here is an example from System.Collections.Immutable:

public partial struct ImmutableArray<T>
{
    [DebuggerTypeProxy(typeof(ImmutableArray<>.Builder.DebuggerProxy))]
    public sealed class Builder : IList<T>, IReadOnlyList<T>
    {
        ....

Recommendations

  1. Judiciously use DebuggerStepThrough, DebuggerBrowsable, DebuggerDisplay, and DebuggerTypeProxy to make your code more debugger-friendly.

Exploring the .NET CoreFX Part 2: Cache ThreadLocal Variables in Locals

This is part 2 of my Exploring the .NET CoreFX Series.

Thread-local storage allows you to mark a global or static variable as local to a thread. In Win32, thread-local storage is provided by the functions TlsAlloc, TlsGetValue, TlsSetValue, and TlsFree. Similarly, C# provides System.ThreadStaticAttribute and System.Threading.ThreadLocal.

Unfortunately, thread-local storage comes at a cost. Reading or writing a thread-local variable is far more expensive than reading or writing a local variable. System.Collections.Immutable uses a trick or two to help ameliorate this expense. For example, System.Collections.Immutable caches thread-local variables in local variables in a method to avoid unnecessary TLS hits on repeated access. Here’s some sample code which implements this:

[ThreadStatic]
private static Stack<RefAsValueType<T>> stack;

public static void TryAdd(T item)
{
    Stack<RefAsValueType<T>> localStack = stack; // cache in a local to avoid unnecessary TLS hits on repeated accesses
    if (localStack == null)
    {
        stack = localStack = new Stack<RefAsValueType<T>>(MaxSize);
    }

    // Just in case we're in a scenario where an object is continually requested on one thread
    // and returned on another, avoid unbounded growth of the stack.
    if (localStack.Count < MaxSize)
    {
        localStack.Push(new RefAsValueType<T>(item));
    }
}

Recommendations

  1. Minimize the use of thread-local storage. If you can, avoid it entirely.
  2. Minimize the number of times code accesses TLS variables. Consider caching thread-local variables in local variables in a method.