Imagine we have a simple class, a wrapper around some array of structs (better data locality etc.):
|
public class ArrayChunkOfStructs<T> where T : struct { private readonly T[] _array; public ArrayChunkOfStructs(int size) { _array = new T[size]; } ... } |
Now, I would like to have an efficient access to every element. Obviously, a trivial indexer would be inefficient here, as it would return a copy of the given array element (a struct):
|
public T this[int index] => _array[index]; |
Luckily, since C# 7.0 we can “ref return” to efficiently return a reference to a given array element which is super nice (refer to my article about ref for more info):
|
public ref T this[int index] => ref _array[index]; |
Here, 99.9999% of devs will stop and will be satisfied with the semantics and performance results. But… if we know we will call it tremendously often, can we do better?!
First of all, let’s see what is being JITted by the .NET Core x64 runtime (5.0rc) when accessing 9th element (index is 8):
|
; AbusingCSharp.Benchmarks.ArrayChunkBenchmarks.TestIndexerRef() sub rsp,28 mov rax,[rcx+8] mov rax,[rax+8] cmp dword ptr [rax+8],8 jbe short M00_L00 add rax,30 add rsp,28 ret M00_L00: call CORINFO_HELP_RNGCHKFAIL int 3 ; Total bytes of code 33 |
To those who know assembler a little, it may be clear what is going on here. But let’s make a short summary:
- we see a little of “stack frame” creation here (sub/add rsp) – could we get rid of it in such a simple method?
- we see bound check in line 4 (cmp the index to 8) to check if we are accessing an array with a correct index – could we get rid of it because we trust our code? 😇
Disclaimer: Getting rid of bound checks is very risky and the resulting dangers probably will overcome the performance benefits. Thus, use it only after heavy consideration, if you are sure why you need it and you can ensure caller’s code will be correct (providing valid indices).
To continue, we will be walking on thin ice of unsafe code now.
The first idea is to use Unsafe.Add to provide kind of “pointer arithmetic” – add an index-element to the first element:
|
public ref T ItemRef(int index) => ref Unsafe.Add(ref _array[0], index); |
The “problem” here is, it produces almost identical results because _array[0] is still a bound-checked array access (and we don’t get rid of stack frame too):
|
; AbusingCSharp.Benchmarks.ArrayChunkBenchmarks.TestItemRef() sub rsp,28 mov rax,[rcx+8] mov rax,[rax+8] cmp dword ptr [rax+8],0 jbe short M00_L00 add rax,10 add rax,20 add rsp,28 ret M00_L00: call CORINFO_HELP_RNGCHKFAIL int 3 ; Total bytes of code 37 |
Hence, the non trivial question arises – how to get the address/ref to the first element of an array?
We could think of doing some Span-based magic (to use MemoryMarshal.GetReference):
|
public ref T ItemRef2(int index) { var span = new Span<T>(_array); return ref Unsafe.Add(ref MemoryMarshal.GetReference(span), index); } |
But you can probably feel it – it produces even slower and bigger code (Span creation handling etc.) while still bound check will be there (Span is “safe”).
So, we need somehow to find a better way of getting an address of the first array’s element. The thing is, the internal structure of the array type is an implementation detail (although well-known). How can we overcome that?
The idea is… to rely on that implementation detail. This approach is being used by DangerousGetReferenceAt method from Microsoft.Toolkit.HighPerformance package maintained by Sergio Pedri. DangerousGetReferenceAt source code explains it well:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
|
public static ref T DangerousGetReferenceAt<T>(this T[] array, int i) { #if NETCORE_RUNTIME var arrayData = Unsafe.As<RawArrayData>(array); ref T r0 = ref Unsafe.As<byte, T>(ref arrayData.Data); ref T ri = ref Unsafe.Add(ref r0, i); return ref ri; #else ... #endif } // Description taken from CoreCLR: see https://source.dot.net/#System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.CoreCLR.cs,285. // CLR arrays are laid out in memory as follows (multidimensional array bounds are optional): // [ sync block || pMethodTable || num components || MD array bounds || array data .. ] // ^ ^ ^ returned reference // | \-- ref Unsafe.As<RawArrayData>(array).Data // \-- array // The base size of an array includes all the fields before the array data, // including the sync block and method table. The reference to RawData.Data // points at the number of components, skipping over these two pointer-sized fields. [StructLayout(LayoutKind.Sequential)] private sealed class RawArrayData { #pragma warning disable CS0649 // Unassigned fields #pragma warning disable SA1401 // Fields should be private public IntPtr Length; public byte Data; #pragma warning restore CS0649 #pragma warning restore SA1401 } |
So, we are casting (reinterpreting) an array reference as a reference to some artificial RawArrayData class, which has a layout corresponding to an array layout. Thus, getting “data” reference is now just trivial. No bound checks at all!
The good news is this method has been ported to .NET 5! So, in .NET 5.0rc we can already use MemoryMarshal.GetArrayDataReference which does exactly the same thing:
|
public static ref T GetArrayDataReference<T>(T[] array) => ref Unsafe.As<byte, T>(ref Unsafe.As<RawArrayData>(array).Data); |
Thus, without any external dependencies our code in .NET 5 may be rewritten to:
|
public ref T ItemRef3(int index) { ref var data = ref MemoryMarshal.GetArrayDataReference(_array); return ref Unsafe.Add(ref data, index); } |
And the resulting code is indeed much more lightweight:
|
; AbusingCSharp.Benchmarks.ArrayChunkBenchmarks.TestItemRef3() mov rax,[rcx+8] mov rax,[rax+8] cmp [rax],eax add rax,10 add rax,20 ret ; Total bytes of code 19 |
No bound-checks, and as an additional reward from the method simplicity – no stack frame.
Benchmarks are indeed showing a noticeable (well, in ns order of magnitude) difference:
|
| Method | Mean | Error | StdDev | Ratio | |--------------- |----------:|----------:|----------:|------:| | TestIndexerRef | 0.4254 ns | 0.0433 ns | 0.0464 ns | 1.00 | | TestItemRef | 0.3497 ns | 0.0261 ns | 0.0244 ns | 0.82 | | TestItemRef2 | 1.4914 ns | 0.0204 ns | 0.0171 ns | 3.50 | | TestItemRef3 | 0.0942 ns | 0.0194 ns | 0.0172 ns | 0.22 | |
Which simply means, we are now about 5x faster than with the initial solution!
Disclaimer #2: Approach taken here with the usage of GetArrayDataReferece is super dangerous. As Levi Broderick, one of .NET framework developers, said: “Also, read the method documentation. It does more than remove bounds checks; it also removes array variance checks. So it might not be valid to write to the ref, even if the index is within bounds. Misuse of the method will bite you in the ass, guaranteed.” Moreover, documentation clearly states that “a reference may be used for pinning but must never be dereferenced” 😇