Disclaimer – this article consists of fragments of my book, adapted and re-edited considerably to be presented in the form of an independent whole post.
As already explained in the previous article, managed pointers have their well-justified limitations – especially in that they are not allowed to appear on the Managed Heap (as a field of reference type or just by boxing). However, for some scenarios, it would be really nice to have a type that contains a managed pointer. The main motivation behind such type is Span<T> – which should be able to represent references “inside” objects (interior pointers), stack address or even unmanaged memory.
Ref struct (byref-like type)
Such type should have similar limitations as the managed pointer itself (to not break limitations of the contained managed pointer). Thus, those kinds of types are commonly called byref-like types (as the other name of the managed pointer is simply byref). The most important limitation of such type should be an impossibility to have heap-allocated instances. Thus the direction seems obvious – structs with some additional restrictions should be introduced. Regular structs by default are stack-allocated but may be heap-allocated in various scenarios, like boxing (for example because of casting to an interface).
Since C# 7.3 we can declare custom byref-like types in the form of ref structs by adding a ref modifier to the struct declaration:
1 2 3 4 5 |
public ref struct RefBook { public string Title; public string Author; } |
C# compiler imposes many limitations on ref structs (to make sure that they will only be stack allocated):
- It cannot be declared as a field of a class or normal struct (because it could be boxed).
- It cannot be declared as a static field for the same reasons.
- It cannot be boxed – so it is not possible to assign/cast it to object, dynamic or any interface type. It is also not possible to use them as array elements, as an array stores boxed structs.
- It cannot be used as an iterator, generic argument and cannot implement an interface (because it could become boxed then).
- It cannot be used as a local variable in async method – as it could be boxed as a part of async state machine.
- It cannot be captured by lambda expressions or local functions – as it would be boxed by the corresponding closure class
Trying to use ref struct in those situations will end with compilation error:
1 2 3 4 5 6 7 8 9 10 11 |
public class RefBookTest { private RefBook book; // Compilation error: Field or auto-implemented property cannot be of type 'RefBook' unless it is an instance member of a ref struct public void Test() { RefBook localBook = new RefBook(); object box = (object) localBook; // Compilation error: Cannot convert type 'CoreCLR.UnsafeTests.RefBook' to 'object' RefBook[] array = new RefBook[4]; // Compilation error: Array elements cannot be of type 'RefBook' } } |
Similar to managed pointers, ref structs can be used only as method parameters and local variables. It is also possible to use ref struct as a field type of other ref structs:
1 2 3 4 5 6 7 8 9 10 11 |
public ref struct RefBook { public string Title; public string Author; public RefPublisher Publisher; } public ref struct RefPublisher { public string Name; } |
Additionally, we can declare readonly ref struct to combine readonly and ref struct features – to declare an immutable struct that will exist only on the stack. It helps the C# compiler and JIT compiler to make further optimizations when using them (like ignoring defensive copy creation).
Although we already know what ref structs provide, one could really bother where they can be useful, if anywhere at all? Obviously, if they were not, they would not be introduced. They provide two very important features based on their limitations:
- they will never be heap allocated – this allows to use them in a special way because their lifetime guarantees are quite strong. As mentioned at the beginning, the main advantage is that they may contain a managed pointer as their field. Currently, in C# this is not directly exposed feature, but it is used internally by Span in a form of ByReference (see below).
- they will be never accessed from multiple threads – as it is illegal to pass stack addresses between threads, it is guaranteed that stack-allocated ref struct is accessed only by its own thread. This eliminates in a trivial way any troublesome synchronization issues without any synchronization costs.
One could ask why not the “stackonly” keyword is used instead of ref keyword when declaring “ref structs”? It seems to be a more self-explaining name. The reason behind that is the fact that “ref structs” provide stronger limitations than a simple “stack-only allocation”: as listed above, for example, they can’t be used as generic arguments and as pointer types. Thus, naming them “stackonly” would be slightly misleading.
ByReference (byref-like instance field)
Having byref-like types, one could think of byref-like instance fields – a managed pointer could be a part of byref-like type because their limitations are related. In other words, a managed pointer may be safely a field of stack-only ref struct because it is guaranteed it will not escape to the heap.
Unfortunately, both C# and CIL does not have support for such byref-like instance fields and runtime changes are required. Those were introduced only in .NET Core 2.1 (and later). Especially for Span<T> type, a new intrinsic (implemented in runtime) type has been introduced to represent such byref-like instance field. We could imagine it looks like:
1 2 3 4 5 6 |
public readonly ref partial struct Span<T> { internal readonly ref T _pointer; private readonly int _length; ... } |
But C# does not support any syntax to represent byref-like fields so until they will be added (if ever), a dedicated type was introduced to represent such fields. This type is named ByReference<T> so the true declaration of Span<T> looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
// ByReference<T> is meant to be used to represent "ref T" fields. It is // working around lack of first class support for byref fields in C# and IL. // The JIT and type loader has special handling for it that turns it // into a thin wrapper around ref T. [NonVersionable] internal ref struct ByReference<T> { private IntPtr _value; ... } public readonly ref partial struct Span<T> { /// <summary>A byref or a native ptr.</summary> internal readonly ByReference<T> _pointer; /// <summary>The number of elements this Span contains.</summary> private readonly int _length; ... } |
ByReference<T> in an internal type (cannot be used outside CoreFX) and is handled by runtime especially to wrap around its managed pointer nature.
Note. General byref-like fields? Is there a chance that general-purpose byref fields will be introduced to C#? It is unlikely it will be justified to allow them for classes (which will, in fact, introduce heap-to-heap interior pointers). It gives too little compared to the difficulty of implementation. But what about general-purpose byref-like fields to be allowed in byref-like (ref struct) types? Will code like “internal readonly ref T _pointer” in the above listing ever be possible? There are ongoing discussions. Besides array slicing already exposed via Span<T>, one could think of other usages of such fields: structs that are interconnected by pointers for faster traversal, returning multiple byref results in a single byref-like struct and so on, and so forth. However, as far as I know, CLR team has no plans to generalize this feature.
In the upcoming blog post, we will see two different Span<T> implementations – so-called “fast” Span (using presented here runtime support in the form of ByReference) and “slow” Span (back-compatible implementation without the runtime support).