.NET reference

Almost every article about .NET memory tells the same story – “there are value types allocated on the stack and reference types allocated on the heap”. And, “classes are reference types while structs are value types”. They are so many popular job interview questions for .NET developers touching this topic. But this is by far not the most appropriate way of seeing a difference between value types and reference types. Why it is not quite correct? Because it describes the concept from the implementation point of view, not from the point that explains the true difference behind those two categories of types. It was already explained in popular articles The Stack Is An Implementation Detail, Part One and The Stack Is An Implementation Detail, Part Two.

We will delve into implementation details later, but it is worth it to note that they are still only implementation details. And as all implementations behind some kind of abstractions, they are subject to change. What really matters is the abstraction they provide to the developer. So instead of taking the same implementation-driven approach, I would like you to present a rationale behind it. And only then we can reach the point when understanding the current implementation will be possible (and will be sensible also).

Let’s start from the beginning, which is an ECMA 335 standard. Unfortunately, the definitions we need are a little blurry, and you can get lost in different meanings of words like type, value, value type, value of type, and so on, so forth. In general, it is worth remembering that this standard defines that:

“any value described by a type is called an instance of that type”

In other words, we can say about value (or instance, interchangeably) of value type or reference type. Going further, those are defined as:

“type, value: A type such that an instance of it directly contains all its data. (…) The values described by a value type are self-contained.”

“type, reference: A type such that an instance of it contains a reference to its data. (…) A value described by a reference type denotes the location of another value.”

We can spot there the true difference in abstraction that those two kinds of types provide: instances (values) of value types contain all its data in place (they are, in fact, values itself ), while reference types values only point to data located “somewhere” (they reference something). But this data-location abstraction implies a very significant consequence that relates to some fundamental topics:

Lifetime:

  • Values of value types contain all its data – we can see it as a single, self-contained being. The data lives as long as the instance of the value type itself.
  • Values of reference types denote the location of another value whose lifetime is not defined by the definition itself.

Sharing:

  • Value type’s value cannot be shared by default – if we would like to use it in another place (for example, although we are passing a bit of implementation details here, method argument, or another local variable), it will be copied byte by byte by default. We say then about passing-by-value semantics. And as a copy of the value is passed to another place, the lifetime of the original value does not change
  • Reference type’s value can be shared by default – if we would like to use it in another place, passing-by-reference semantics will be used by default. Hence, after that, one more reference type instance denotes the same value location. We have to track somehow all references to discover the value’s lifetime.

Identity:

  • Value types do not have an identity. Value types are identical if and only if the bit sequences of their data are the same.
  • Reference types are identical if and only if their locations are the same.

Again, there is no single mention about heap or stack in this context at all. Keeping in mind those differences and definitions should clarify things a little, although you may need a while to get used to them. Next time when asked during job interview about where value types are stored, you may start from such an alternative, extended elaboration.

Note. There is yet another type of category we should know – immutable types. Immutable type is a type whose value cannot be changed after creation. No more and no less. They do say nothing about their value or reference semantics. In other words, both value type and reference type can be immutable. We can enforce immutability in object-oriented programming by simply not exposing any methods and properties that would lead to changing an object’s value.

Locations

When considering a .NET stack machine, we should mention an important concept of locations. When considering storage of various values required for program execution, a few logical locations exist:

  • local variables in a method
  • arguments of a method
  • instance field of another value
  • static field (inside class, interface or module)
  • local memory pool
  • temporarily on the evaluation stack

Type Storage

One could insist on asking where is the place here that implies using stack or heap for those two, basic kinds of types? The answer is – there is none! This is an implementation
detail taken during design of Microsoft .NET Framework CLI standard. Because it was for years overwhelmingly the most popular one, the “value types allocated on the stack
and reference types allocated on the heap” story have been repeated again and again like a mantra without deep reflection. And since it is a very good design decision, it was
repeated in different CLI implementations we have discussed earlier. Keep in mind, this sentence is not entirely true in the first place. As we will see in the following sections,
there are exceptions to that rule. Different locations can be treated differently as to how to store the value. And this is exactly the case with CLI as we will soon see.

Nevertheless, we only can think about the storage of the value types and reference types when designing CLI implementation for a specific platform. We simply just need
to know whether we have stack or heap available at all on that particular platform! As the vast majority of today’s computers have both, the decision is simple. But then probably
we have also CPU registers and no one is mentioning them in the “value types allocated on the…” mantra although it is the same level of implementation detail like using
stack or heap.

The truth is that the storage implementation of one or another type may be located mostly in the JIT compiler design. This is a component that is designed for a specific
platform on which it is running so we know what resources will be available there. x86/x64-based JIT has obviously both stack, heap, and registers at its disposal. However, such
a decision on where to save a given type value can be left not only at the JIT compiler level. We can allow the compiler to influence this decision based on the analysis that
it performs. And we can even expose somehow such a decision to the developer at the language level (exactly like in C++ where you can allocate objects both on the stack or
on the heap).

There is an even simpler approach taken by Java, where there are no user-defined value types at all, hence no problem exists where to store them! A few built-in primitives
(integers and so forth) are said to be value types there, but everything else is being allocated on the heap (not taking into consideration escape analysis described later). In
case of .NET design, we could also decide to allocate all types instances (including value types) on the heap, and it would be perfectly fine as long as the value type and reference type semantic would not be violated. When talking about memory location, the ECMA-335 standard gives complete freedom:

“The four areas of the method state – incoming arguments array, local variables array, local memory pool and evaluation stack – are specified as if logically distinct areas. A conforming implementation of the CLI can map these areas into one contiguous array of memory, held as a conventional stack frame on the underlying target architecture, or use any other equivalent representation technique.”

Why these and no other implementation decisions were taken will be more practical to explain in the following sections, discussing separately the value and the reference types.

Note. There is only a single important remark left. When we know now that talking about stack and heap is an implementation detail, it can still be reasonable to do that. Unfortunately, there is a place where “as it should be” odds with the “as is practical”. And this place is performance and memory usage optimization. If we are writing our code in C# targeting x86/x64 or ARM computers, we know perfectly that heap, stack, and registers will be used by those types in certain scenarios. So as The Law of Leaky Abstractions says, value or reference type abstraction can leak here. And if we want, we can take advantage of it for performance reasons.

Value Types

As previously said, value type “directly contains all its data”. ECMA 335 defines value as:

” A simple bit pattern for something like an integer or a float. Each value has a type that describes both the storage that it occupies and the meanings of the bits in its representation, and also the operations that can be performed on that representation. Values are intended for representing the simple types and non-objects in programming languages.”

So what about “value types are stored on the stack” part of the story? Regarding implementation, there is nothing stopping from storing all value types on the heap, irrespective of the location used. Except for the fact that there is a better solution – using the stack or CPU register. The stack is quite a lightweight mechanism. We can “allocate” and “deallocate” objects there by simply creating a properly sized activation frame and dismissing it when no longer needed. As the stack seems to be so fast, we should use it all the time, right?

The problem is it is not always possible, mainly because of the lifetime of the stack data versus the desired lifetime of the value itself. It is the life span and value sharing that determines which mechanism we can use to store value type data.

Let’s now consider each possible location of value type and what storage we can use there:

  • local variables in a method – they have a very strict and well-defined lifetime, which is a lifetime of a method call (and all its subcalls). We could allocate all value-type local variables on the heap and then just deallocate them when the method ends. But we could also use stack here because we know there is only a single instance of the value (there is no sharing of it). So there is no risk that someone will try to use this value after the method ends or concurrently from another thread. It is then just perfectly fine to use a stack inside an activation frame as storage for local value types (or use a CPU register(s)).
  • arguments of a method – they can be treated exactly as local variables here so again, we can use the stack instead of the heap.
  • instance field of a reference type – their lifetime depends on the lifetime of the containing value. For sure it may live longer than the current or any other activation frame so a stack is not the right place for it. Hence, value types that are fields of reference types (like classes) will be allocated on the heap along with them (which we know as one of the boxing reasons).
  • instance field of another value-type – here the situation is slightly complicated. If the containing value is on the stack, we would also use it. If it is on the heap already, we will use the heap for the field’s value also.
  • static field (inside class, interface or module) – here the situation is similar to using an instance field of reference type. The static field has a lifetime of the type in which it is defined. This means we could not use the stack as storage, as an activation frame may live much shorter.
  • local memory pool – its lifetime is strictly related to the method’s lifetime (ECMA says “the local memory pool is reclaimed on method exit”). This means we can without a problem use stack and that’s why local memory pool is implemented as a growth of the activation frame.
  • temporarily on the evaluation stack – value on the evaluation stack has a lifetime strictly controlled by JIT. It perfectly knows why this value is needed and when it will be consumed. Hence, it has complete freedom whether it would like to use the heap, stack, or register. From performance reasons, it will obviously try to use CPU registers and the stack.

So that is how we come to the first part – “value types are stored on the stack”. As we see, the more true is the statement – “value types are stored on the stack when the value is a local
variable or lives inside the local memory pool. But are stored on the heap when they are a part of other objects on the heap or are a static field. And they always can be stored inside CPU register as a part of evaluation stack processing”. Slightly more complicated, isn’t?

Reference Types

When talking about reference types, it is convenient to consider them as consisting of two entities::

  • reference – a value of the reference type is a reference to its data. This reference means, in particular, an address of data stored elsewhere. A reference itself can be seen as a value type because internally it is just a 32- or 64-bit wide address. References have copy-by-value semantics so when passed between locations, they are just copied.
  • reference type’s data – this is a memory region denoted by the reference. Standard does not define where this data should be stored. It is just stored elsewhere.

.NET reference

Considering possible storage for each location of the reference type is simpler than for value types. As mentioned, because references can share data, the lifetime of them is
not well-defined. In general cases, it is impossible to store reference types on instances the stack because their lifetime is probably much longer than an activation frame life (method call duration). Hence it is quite an obvious implementation decision where to store them and that is how we come to “reference types are stored on the heap” part of the story.

Regarding the heap allocation possibilities for reference types – there is one exception. If we could know that a reference type instance has the same characteristic as a local value-type variable, we could allocate it on the stack, as usual for value types. This particularly means we should know whether a reference does not escape from its local scope (does not escape the stack or thread) and start to be shared among other references. A way of checking this is called Escape Analysis. It has been successfully implemented in Java where it’s especially beneficial because of their approach of allocating almost everything on the heap by default. At the time of this writing, .NET environment does not support Escape Analysis, yet. Well, at least not officialy. And this is the topic we will look at in the next blog post!

During our trip around managed pointers and structs, we get the last topic to discuss – readonly semantics. Thus, we touch today topics like readonly structs and readonly paremeters.

Readonly ref variables

Ref types are quite powerful, because we may change its target. Thus, readonly refs were introduced in C# 7.2 that controls the ability to mutate the storage of a ref variable.
Please note a subtle difference in such context between a managed pointer to a value type versus a reference type:

  • for value type target – it guarantees that the value will not be modified. As the value here is the whole object (memory region), in other words, it guarantees that all fields will not be changed.
  • for reference type target – it guarantees that the reference value will not be changed. As the value here is the reference itself (pointing to another object), it guarantees that we will not change it to point to another object. But we can still modify the properties of the referenced object.

Let’s use an example returning a readonly ref:

BookCollection may illustrate the difference between readonly ref in case of both value type and reference type.Continue reading

Among many things that are coming with the upcoming C# 8.0, one perfectly fits the topic of ref structs I’ve raised in my previous postdisposable ref structs.

As one of the blog posts announcing C# 8.0 changes (in Visual Studio 2019 Preview 2) mentions:

“Ref structs were introduced in C# 7.2, and this is not the place to reiterate their usefulness, but in return they come with some severe limitations, such as not being able to implement interfaces. Ref structs can now be disposable without implementing the IDisposable interface, simply by having a Dispose method in them.”

Continue reading

Disclaimer – this article consists of fragments of my book, adapted and re-edited considerably to be presented in the form of an independent whole post.

As already explained in the previous article, managed pointers have their well-justified limitations – especially in that they are not allowed to appear on the Managed Heap (as a field of reference type or just by boxing). However, for some scenarios, it would be really nice to have a type that contains a managed pointer. The main motivation behind such type is Span<T> – which should be able to represent references “inside” objects (interior pointers), stack address or even unmanaged memory.

Ref struct (byref-like type)

Such type should have similar limitations as the managed pointer itself (to not break limitations of the contained managed pointer). Thus, those kinds of types are commonly called byref-like types (as the other name of the managed pointer is simply byref). The most important limitation of such type should be an impossibility to have heap-allocated instances. Thus the direction seems obvious – structs with some additional restrictions should be introduced. Regular structs by default are stack-allocated but may be heap-allocated in various scenarios, like boxing (for example because of casting to an interface).Continue reading

Disclaimer – this article consists of fragments of my book, adapted and re-edited considerably to be presented in the form of an independent whole post.

Most of the time a regular .NET developer uses object references and it is simple enough because this is how a managed world is constructed – objects are referencing each other via object references. An object reference is, in fact, a type-safe pointer (address) that always points to an object MethodTable reference field (it is often said it points at the beginning of an object). Thus, using them may be quite efficient. Having an object reference, we simply have the whole object address. For example, the GC can quickly access its header via constant offset. Addresses of fields are also easily computable due to information stored in MethodTable.

There is, however, another pointer type in CLR – a managed pointer. It could be defined as a more general type of reference, which may point to other locations than just the beginning of an object. ECMA-335 says that a managed pointer can point to:

  • local variable – whether it be a reference to a heap-allocated object or simply stack-allocated type,
  • parameter – like above,
  • field of a compound type – meaning a field of other type (whether it is value or reference type),
  • an element of an array

Despite this flexibility, managed pointers are still types. There is a managed pointer type that points to System.Int32 objects, regardless of their localization, denoted as System.Int32& in CIL. Or SomeNamespace.SomeClass& type pointing to our custom SomeNamespace.SomeClass instances. Strong typing makes them safer than pure,
unmanaged pointers that may be cast back and forth for literally everything. This is also why managed pointers do not offer pointer arithmetic known from raw pointers – it particularly does not make sense to “add” or “subtract” addresses they represent, pointing to various places inside objects or to local variables.

Continue reading

TL;DR – would be post-mortem finalization available thanks to phantom references useful in .NET? What is your opinion, especially based on your experience with the finalization of your use cases? Please, share your insights in comments!

Both JVM and CLR has the concept of finalizers which is a way of implicit (non-deterministic) cleanup – at some point after an object is recognized as no longer reachable (and thus, may be garbage collected) we may take an action specified by the finalizer – a special, dedicated method (i.e. Finalize in C#, finalize in Java). This is mostly used for the purpose of cleaning/releasing non-managed resources held by the object to be reclaimed (like OS-limited, and thus valuable, file or socket handles).

However, such form of finalization has its caveats (elaborated in detail below). That’s why in Java 9 finalize() method (and thus, finalization in general) has been deprecated, which is nicely explained in the documentation:

“Deprecated. The finalization mechanism is inherently problematic. Finalization can lead to performance issues, deadlocks, and hangs. Errors in finalizers can lead to resource leaks; there is no way to cancel finalization if it is no longer necessary; and no order is specified among calls to finalize methods of different objects. Furthermore, there are no guarantees regarding the timing of finalization. The finalize method might be called on a finalizable object only after an indefinite delay, if at all.”

Continue reading

As a part of my consultancy job, I have a pleasure to help various customers with problems that could be described collectively as GC-related (or memory-related in general). One day Tamir Dresher from Clarizen company (BTW, an author of Rx.NET in Action) contacted me with such an extremely interesting message (emphasis mine):

We are experiencing a phenomenon of GC duration of 15 minutes in our backend servers. (…) Do you think we can have a session with you and perhaps you’ll have ideas on how to find the root cause?

15 minutes! That’s an infinity! If we see something like this, one thought comes to mind – something really serious must be happening there! As nowadays most of such problems may be diagnosed remotely, after signing NDAs we could go straight into attacking the problem. Clarizen has provided a very well-prepared and concise summary of their architecture and current findings.Continue reading

Hi all! I am thrilled to announce that after more than two years of intensive book writing, it is finally available for preorder! Its about 800 pages are solely dedicated to the topic of .NET memory management and its Garbage Collector. With many, many internal workings of all this. I believe, personally, that there is currently no single book or even finite set of articles online that give so comprehensive insight into this topic.


announce

As a person who sincerely loves .NET and related performance topics – and spent quite a lot of time diagnosing various .NET memory-related issues – I’ve just needed to write such book. And as it covers all recent changes in .NET Core 2.1 (including Span, Memory or pipelines), I believe there is no better time to publish such book!

Let me give you an excerpt from the introduction of the book, which should explain my intentions when writing it:

Continue reading

This blog post was written as part of the preparations while writing the book about .NET, which will be announced in a few weeks. If you want to be informed about its publication and receive auxiliary materials, feel free to subscribe to my newsletter. Many thanks to Stephen Toub that helped in reviewing this code.

Async programming becomes more and more popular. While being very convenient in use, from performance perspective there are scenarios where regular Task-returning async methods have one serious drawback: they need to allocate a new Task to represent the operation (and its result). Such heap-allocated Task is unavoidable in the truly asynchronous path of execution because async continuations are not guaranteed to be executed on the same thread – thus the operation must persist on the heap, not on the stack.

However, there are cases where async operations may complete synchronously (because of really fast meeting some conditions). It would be nice to avoid heap-allocating Task in such case, created just to pass the result of an operation. Exactly for such purpose ValueTask type was introduced in .NET Core 2.0 (and the corresponding AsyncValueTaskMethodBuilder handling the underlying state machine). Initially, it was a struct made as a discriminated union, which could take one of two possible values:

  • ready to use the result (if the operation completed successfully synchronously)
  • a normal Task which may be awaited on (if the operation become truly asynchronous)

In other words, ValueTask helps in handling synchronous path of async method execution. Thanks to being a struct (which will be allocated on the stack or enregistered into CPU register) synchronous result of the operation can be returned without heap allocations. And only in case of an asynchronous path, a new Task will be heap-allocated eventually by underlying machinery:

But what if we were able to not heap-allocate Task even in case of an asynchronous path? This may be useful in very, very high-performance code, avoiding async-related allocations at all. As already said, something on the heap must represent our async operation because there is no thread affinity. But why not use heap-allocated, pooled objects for that, reused between successive async operations?

Indeed, since .NET Core 2.1, ValueTask can also wrap an object implementing IValueTaskSource interface. Such an object can be pooled and reused to minimize allocations. It represents our operation and underlying AsyncValueTaskMethodBuilder is aware of it, calling appropriate methods of IValueTaskSource interface. Here is how the previous example looks with the help of custom IValueTaskSource implementation described in this post:

Thanks to pooling, even on the asynchronous path there will be no allocations (at least most often, if our pool is used efficiently). We can await such method in a regular way and underlying machinery will take care of it.

Note. If you would like to hear more about Task, ValueTask and IValueTaskSource again, in similar but other words, please look at great Task, Async Await, ValueTask, IValueTaskSource and how to keep your sanity in modern .NET world post by Szymon Kulec.

Implementation details

Although the rationale behind IValueTaskSource seems to be clear, as well as its usage presented above, implementing it is not trivial. When implementing IValueTaskSource interface, we must implement three following methods:
* GetResult – called only once, when the async state machine needs to obtain the result of the operation
* GetStatus – called by the async state machine to check the status of the operation
* OnCompleted – called by the async state machine when wrapping ValueTask has been awaited. We should remember here the continuation to be called when the operation completes (but if it already has been completed, we should call the continuation immediately)

Seems easy, right? Read on to see if it really is! All source code described here is available in my PooledValueTaskSource repository on GitHub. There are quite many comments in the code but this post explains most relevant parts as well. Do not be also surprised with many diagnostic Console.Write in this code – it serves to illustrate the internal working of this class in prepared example program (also available in the repository).

In my custom implementation, I use object pooling based on ObjectPool class based on the internal class taken from Roslyn source code and a little refactored (with renaming mostly) – I’ve omitted it here for brevity as not so relevant. From our perspective here there are obvious Rent and Return methods, period.

In my implementation, I am also mostly based on code from AwaitableSocketAsyncEventArgs in System.Net.Sockets.Socket
and AsyncIOOperation in ASP.NET IIS Integration code. What I’ve tried to do is to provide Minimal Valuable Product that is correct and working (stripping as much as possible from the mentioned code).

My custom IValueTaskSource represents an operation that returns a string that is being read from the provided file. Obviously, one would probably like to introduce a more generic class with generic result type and action being provided as a lambda expression. However, to not clutter such example too much, I’ve decided to prepare it in such “hardcoded”, specific scenario. Feel free to contribute more generic versions!

Let’s start from fields that FileReadingPooledValueTaskSource contains:

The most important fields of FileReadingPooledValueTaskSource include:

  • Action< object > continuation – it represents a continuation to be executed when our operation ends
  • string result – it keeps the result of our operation (in case of success)
  • Exception exception – it keeps an Exception instance that happened during executing our operation (in case of failure)
  • short token – current token value given to a ValueTask and then verified against the value it passes back to us. This is not meant to be a completely reliable mechanism, doesn’t require additional synchronization, etc. It’s purely a best effort attempt to catch misuse, including awaiting for a value task twice and after it’s already being reused by someone else
  • object state – state internally used by asynchronous machinery
  • static readonly Action<object> CallbackCompleted – sentinel object used to indicate that the operation has completed prior to OnCompleted being called

Let’s now look at each of IValueTaskSource method implementation. GetResult is quite easy – it will be called only once by underlying state machine when we inform that our operation has completed (by GetStatus method explained soon). Thus, we need to reset the object state (to be reusable), return it to the pool and return the result (or throw an exception in case of failure):

GetStatus is called by the state machine to check the current status of our operation. In my case, I assume it is completed if result is no more null. Depends on the exception field, it is then succeeded or faulted:

The most complex is OnCompleted method implementation. It is being called by the underlying state machine if wrapped ValueTask is being awaited. Two scenarios may happen here that must be handled:

  • if an operation has not yet completed – we store the provided continuation to be executed once the operation is completed
  • if an operation has already completed – in such case our internal continuation should be already set to CallbackCompleted value. If it so, we simply invoke the continuation here

Please note how much code is dedicated to properly get the context of the continuation (with respect to provided ValueTaskSourceOnCompletedFlags):

This concludes implementing IValueTaskSource methods but we need to add two more crucial pieces into this puzzle: a method that starts our operation and method that is called when an operation completes asynchronously.

The first one, named by my as simple as RunAsync (called in our example at the beginning of the article) is responsible for executing the main work:

I’ve implemented here simulation of some operation that may both complete immediately (synchronously) and asynchronously. In case of asynchronous path, returned ValueTask pass the result so we avoid allocations again. In case of asynchronous case the key is to return ValueTask that wraps… ourselves. It may be then awaited on, while we also started asynchronous processing (simulated by thread pool work in our case).

When asynchronous operation finishes, NotifyAsyncWorkCompletion method will be called (remember – in the real-world scenario this would be some callback registered in asynchronous IO or other low-level API). The responsibility of this method is simple:

  • it stores result and/or exception
  • if the operation is not yet awaited (in such case this.continuation will be null) – it only sets continuation to CallbackCompleted. Continuation will be executed in OnCompleted method when ValueTask will be awaited
  • if the operation is already awaited (in such case this.continuation contains awaited continuation) – it executes continuation in the appropriate context (which again is quite a complex process)

 

Above-mentioned clearing and returning to the pool is implemented in ResetAndReleaseOperation (yes I know, SRP is dying here, refactor!). The only field we cannot clear is token, which is solely dedicated to detecting incorrect re-usage of those objects:

And… only such little code is necessary to avoid heap-allocating in case of async operations!

Remarks

  • I do not claim that my code in current form is ideal. Quite opposite, I still expect it to be by far ideal! It serves as an illustration and base for further development. Please, feel invited to comment and to contribute to making it better!
  • Current repository is oversimplified – due to the work on the book, I do not have time to reorganize it properly (especially include comprehensive unit tests). Again, feel free to contribute!

 

zerogclead

A few months ago I wrote an article about Zero GC in .NET Core 2.0. This proof of concept was based on a preview version of .NET Core 2.0 in which a possibility to plug in custom garbage collector has been added. Such “standalone GC”, as it was named, required custom CoreCLR compilation because it was not enabled by default. Quite a lot of other tweaks were necessary to make this working – especially including required headers from CoreCLR code was very cumbersome.

However upcoming .NET Core 2.1 contains many improvements in that field so I’ve decided to write follow up post. I’ve also answered one of the questions bothering me for a long time (well, at least started answering…) – how would real usage of Zero GC like in the context of ASP.NET Core application?

.NET Core 2.1 changes

Here is a short summary of most important changes. I’ve updated CoreCLR.Zero repository to reflect them.

  • first of all, as previously mentioned, now standalone GC is pluggable by default so no custom CoreCLR is required. We will be able to plug our custom GC just by setting a single environment variable:
  • as standalone GC matured, documentation in CoreCLR appeared
  • a great improvement is that code between library implementing standalone GC and CoreCLR has been greatly decoupled. Now it is possible to include only a few files directly from CoreCLR code to have things compiled:

    Previously I had to create my own headers with some of the declarations from CoreCLR copy-pasted which was obviously not maintanable and cumbersome.
  • loading path has been refactored slightly. InitializeGarbageCollector inside CoreCLR calls GCHeapUtilities::LoadAndInitialize() with the following code inside:

    Inside LoadAndInitializeGC there is a brand new functionality – verification of GC/EE interface version match. It checks whether version used by standalone GC library (returned by GC_VersionInfo function) matches the runtime version – major version must match and minor version must be equal or higher. Additionaly, GC initialization function has been renamed to GC_Initialize.
  • core logic of my the poor man’s allocator remained the same so please refer to the original article for details

ASP.NET Core 2.1 integration

As this CoreCLR feature has matured, I’ve decided do use standard .NET CLI instead of CoreRun.exe. This allowed me to easily test the question bothering me for a long time – how even the simplest ASP.NET Core application will consume memory without garbage collection? .NET Core 2.1 is still in preview so I’ve just used Latest Daily Build of .NET CLI to create WebApi project:

I’ve modified Controller a little to do something more dynamic that just returning two string literals:

Additionally, I’ve disabled Server GC which is enabled by default. Obviously setting GC mode does not make sense as there is no GC at all, right? However, Server GC crashes runtime because GC JIT_WriteBarrier_SVR64 is being used which requires valid card table address – and there are no card tables either 🙂

Then we simply compile and run, remembering about the environment variable:

Everything should be running fine so… congratulations! We’ve just run ASP.NET Core application on .NET Core with standalone GC plugged in which is doing nothing but allocating.

Benchmarks

I’ve created the same WebApi via regular .NET Core 2.0 CLI for reference. Then via SuperBenchmarker I’ve started simple load test: 10 concurrent users making 100 000 requests in total with 10 ms delay between each request.

.NET Core 2.1 with Zero GC:

zerogcaspnetcore02

.NET Core 2.0:

zerogcaspnetcore03

As we can see classic GC from .NET Core was able to process slightly more requests (357.8 requests/second) comparing to version with Zero GC plugged in. It does not surprise me at all because my version uses the most primitive allocation based on calloc. I’m quite surprised that Zero GC is doing so well after all. However, this is not so interesting because I assume that replacing calloc with a simple bump a pointer allocation would improve performance noticeably.

What is interesting is the memory usage over time. As you can see in the chart below, after a minute of such test, the process using Zero GC takes around 1 GB of memory. This is… quite a lot. Not sure yet how to interpret this. Version with regular GC ended with a stable 120 MB size. Both started from fresh run.

zerogcaspnetcore

This would mean that each REST WebApi requests triggers around 55 kB of allocations. Any comments will be appreciated here…

Update 30.01.2018: After debugging allocations during single ASP.NET requests, most of them comes from RouterMiddleware. This is no surprise as currently this application does almost nothing but routing… I’ve uploaded sample log of such single request which seems to be minimal (others are allocating some buffers from time to time). It consumes around 7 kB of memory.