It is said that picture is worth a thousand words, and I agree. That’s why I like preparing technical drawings to explain various concepts. So, here it is – a short story of how async/await works in .NET.

thereisnothread

The main power behind async/await is that while we “await” on an ongoing I/O operation, the calling thread may be released for doing other work. And this provides a great thread re-usability. Thus, better scalability – much smaller number of threads is able to handle the same amount of operations comparing to asynchronous/waiting approach.

The main role here plays so-called overlapped I/O (in case of Windows) which allows to asynchronously delegate the I/O operation to the operating system, and only after completion the provided callback will notify us about the result. The main workforce here is so-called I/O completion port (IOCP).Continue reading

poh01

In the upcoming .NET 5 a very interesting change is added to the GC – a dedicated Pinned Object Heap, a very new type of the managed heap segment (as we have Small and Large Object Heaps so far). Pinning has its own costs, because it introduces fragmentation (and in general complicates object compaction a lot). We are used to have some good practices about it, like “pin only for…:

  • a very short time” so, the GC will not bother – to reduce probability that the GC happens while many objects were pinned. That’s a scenario to use fixed keyword, which is in fact only a very lightweight way of flagging particular local variable as a pinned reference. As long as GC does not happen, there is no additional overhead.
  • a very long time”, so the GC will promote those objects to generation 2 – as gen2 GCs should be not so common, the impact will be minimized also. That’s a scenario to use GCHandle of type Pinned, which is a little bigger overhead because we need to allocate/free handle.

However, even if applied, those rules will produce some fragmentation, depending how much you pin, for how long, what’s the resulting layout of the pinned objects in memory and many other, intermittent conditions.

So, in the end, it would be perfect just to get rid of pinned objects and move them to a different place than SOH/LOH. This separate place would be simply ignored, by the GC design, when considering heap compaction so we will get pinning behaviour out of the box.Continue reading

cilvalid

Everyone knows that C# is a strongly typed language and incorrect type usage is simply not possible there. So, the following program will just not compile:

That’s good, it means we can trust Roslyn (C# compiler) not to generate improper type-safety code. But what if we rewrite the same code to the Common Intermediate Language, omitting completely C# and its compiler?

First of all, it will be assembled by ILASM tool without any errors because it is a syntactically correct CIL. And ILASM is not a compiler, so it will not do any type checks on its own. So we end up with an assembly file with a smelly CIL inside. If not using ILASM, we could also simply modify CIL with the help of any tool like dnSpy.

Ok, let’s say that is fine. But what will happen when we try to execute such code? Will .NET runtime verify somehow the CIL of those methods? Just-In-Time compiler for sure will notice type mismatch and do something to prevent executing it, right?

What will happen is the program will just execute without any errors and will print 4 (the length of “Test”) followed by… 0 in a new line. The truth is that JIT or any other part of .NET runtime does not examine type safety.

Why the result is 0? Because when the JIT emits native code of a particular method, it uses type layout information of the data/types being used. And it happens that string.Length property is just an inlined method call that access the very first int field of an object (because string length is stored there):

As we pass a newly created object instance, which always has one pointer-sized field initialized to zero (this is a requirement of the current GC), the result is 0.

And yes, if we pass a reference to an object with some int field, its value will be returned (again, instead of throwing any type-safety related runtime exception). The following code (when converted to CIL) will execute with no errors and print 44!

This all may be quite suprising, so what ECMA-335 standard says about it? Point “II.3 Validation and verification” mentions all CIL verification rules and algorithms and states:

“Aside from these rules, this standard leaves as unspecified:

  • The time at which (if ever) such an algorithm should be performed.
  • What a conforming implementation should do in the event of a verification failure.”

And:

“Ordinarily, a conforming implementation of the CLI can allow unverifiable code (valid code that does not pass verification) to be executed, although this can be subject to administrative trust controls that are not part of this standard.”

While indeed .NET runtime does some validation, it does not verify the IL. The difference? If we run the following code:

It will end up with System.InvalidProgramException: Common Language Runtime detected an invalid program. being thrown. So, we can summarize it as the fact that invalid CIL code may trigger InvalidProgramException for some cases, but for others will just allow the program to execute (with many unexpected results). And all this may happen only during JIT compilation, at runtime.

So, what can we do to protect ourselves, before deploying and running it on production? We need to verify our IL on our own. There is PEVerify tool for exactly that purpose, shipped with .NET Framework SDK. You can find one in a folder similar to c:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.8 Tools\x64\.

When running against our example, it will indeed detect an incorrect method with a proper explanation:

The only problem with PEVerify is… it does not support .NET Core.

What for .NET Core then? There is ILVerify, a cross-platform, open source counterpart of it developed as a part of CoreRT runtime (although it supports analyzing both .NET Framework and .NET Core). Currently, to have it working we need to compile the whole CoreRT (How to run ILVerify? issue #6198) OR you can use unofficial Microsoft.DotNet.ILVerification package to write your own command line tool (inspired by the original Program.cs).

So, nothing officially supported and shipped with the runtime itself, yet. And BTW, there is ongoing process to make Roslyn IL verification fully working as well.

Sidenote

The previous example was a little simplified because ConsumeString(string) called a virtual get_Length method on a sealed string type, so it was aggressively inlined. If we experiment with regular virtual method on a not sealed type, things become more intermittent because now the call is using virtual stub dispatch mechanism. In the following example (again, if rewritten to CIL), how Consume will behave depends on what we have passed as an argument and where the pointers of VSD will follow (most likely, triggering access violation).

Conclusions

  • if you do write in CIL, to have more power in hands (like using Reflection.Emit, manipulate CIL fore the code weaving or any other magic like the whole Unsafe class), please be aware of the difference between validation and verification. And verify your assembly on your own, as JIT compiler will not do it!
  • if you do want to trust your app FULLY, run IL verification before executing it. Probably it could be even added to you CI pipeline as an additional check – you may trust your code but not someone else code (and the code modified by the tools you use). And yes, it is not straightforward currently in .NET Core case.

Subscribe to my mailing list dedicated for .NET performance and internals related stuff!

Please select all the ways you would like to hear from Konrad Kokosa:

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website.

Mobius Overview

.NET application is “just” a piece of CIL bytecode to be executed by the .NET runtime. And .NET runtime is “just” a program that is able to perform this task. It happens that currently .NET Framework/.NET Core runtimes are written in C++. I am also fully aware of CoreRT that was .NET runtime with many parts rewritten to C# (like type system) but still, crucial parts (including JIT compiler and the GC) were left written in C++.

But what if we write .NET runtime as… .NET application? Is is possible at all? I mean, literally no native/C++ code, everything running as .NET Core application written in C#? Does this sound like kind of inception and infinite recursion? It would require running one .NET runtime on the top of another .NET runtime, right?

I decided to check it out and that’s how Mobius runtime idea has been coined! Yeah, I know it sound strange and I do not expect it will be anything close to production ready thingy in the nearest century. I am fully aware of the amount of code needed to be written to make full .NET runtime. However, I found it interesting to validate such idea and I find it small usages as well. Imagine a NuGet package with the separate runtime that you can add to your application 😉

Continue reading

GC posters

In short words, I’ve prepared two posters about .NET memory management. They provide a comprehensive summary of “what’s inside .NET GC”, based on .NET Core (although almost all information is relevant also for .NET Framework).

The first shows a static point of view – how memory is organized into segments, generations, what are the roots and etc.:

Poster I

The second shows a dynamic point of view – how GC threads are working and what GC modes are available:

Poster II

 

You can download them for FREE in vector PDF format from my https://prodotnetmemory.com site. Take it and print it!

 

 

bothgamesbanner_white

2 Developers from Poland join forces to publish their IT-related card games: from Devs to Devs – OutOfMemory and IT Startup!

Each of us already has a published book and now we want to share some knowledge with fun games! Remember all the original 151 Pokemon names? How about playing a game that lets you remember something useful: like what technologies are beneficial to learn to be a better Developer! One of the games is already published in Poland and sold over 3k copies (so we how already some publishing know how). Now we want to start a company and publish 2 of our IT related card games worldwide in English.

Have fun while playing our print and play prototypes!

We plan to publish the games as one company on Kicktarter in Q1 2020.

Continue reading

zerogclead

A few months ago I wrote an article about Zero GC in .NET Core 2.0. This proof of concept was based on a preview version of .NET Core 2.0 in which a possibility to plug in custom garbage collector has been added. Such “standalone GC”, as it was named, required custom CoreCLR compilation because it was not enabled by default. Quite a lot of other tweaks were necessary to make this working – especially including required headers from CoreCLR code was very cumbersome.

However upcoming .NET Core 2.1 contains many improvements in that field so I’ve decided to write follow up post. I’ve also answered one of the questions bothering me for a long time (well, at least started answering…) – how would real usage of Zero GC like in the context of ASP.NET Core application?

.NET Core 2.1 changes

Here is a short summary of most important changes. I’ve updated CoreCLR.Zero repository to reflect them.

  • first of all, as previously mentioned, now standalone GC is pluggable by default so no custom CoreCLR is required. We will be able to plug our custom GC just by setting a single environment variable:
  • as standalone GC matured, documentation in CoreCLR appeared
  • a great improvement is that code between library implementing standalone GC and CoreCLR has been greatly decoupled. Now it is possible to include only a few files directly from CoreCLR code to have things compiled:

    Previously I had to create my own headers with some of the declarations from CoreCLR copy-pasted which was obviously not maintanable and cumbersome.
  • loading path has been refactored slightly. InitializeGarbageCollector inside CoreCLR calls GCHeapUtilities::LoadAndInitialize() with the following code inside:

    Inside LoadAndInitializeGC there is a brand new functionality – verification of GC/EE interface version match. It checks whether version used by standalone GC library (returned by GC_VersionInfo function) matches the runtime version – major version must match and minor version must be equal or higher. Additionaly, GC initialization function has been renamed to GC_Initialize.
  • core logic of my the poor man’s allocator remained the same so please refer to the original article for details

ASP.NET Core 2.1 integration

As this CoreCLR feature has matured, I’ve decided do use standard .NET CLI instead of CoreRun.exe. This allowed me to easily test the question bothering me for a long time – how even the simplest ASP.NET Core application will consume memory without garbage collection? .NET Core 2.1 is still in preview so I’ve just used Latest Daily Build of .NET CLI to create WebApi project:

I’ve modified Controller a little to do something more dynamic that just returning two string literals:

Additionally, I’ve disabled Server GC which is enabled by default. Obviously setting GC mode does not make sense as there is no GC at all, right? However, Server GC crashes runtime because GC JIT_WriteBarrier_SVR64 is being used which requires valid card table address – and there are no card tables either 🙂

Then we simply compile and run, remembering about the environment variable:

Everything should be running fine so… congratulations! We’ve just run ASP.NET Core application on .NET Core with standalone GC plugged in which is doing nothing but allocating.

Benchmarks

I’ve created the same WebApi via regular .NET Core 2.0 CLI for reference. Then via SuperBenchmarker I’ve started simple load test: 10 concurrent users making 100 000 requests in total with 10 ms delay between each request.

.NET Core 2.1 with Zero GC:

zerogcaspnetcore02

.NET Core 2.0:

zerogcaspnetcore03

As we can see classic GC from .NET Core was able to process slightly more requests (357.8 requests/second) comparing to version with Zero GC plugged in. It does not surprise me at all because my version uses the most primitive allocation based on calloc. I’m quite surprised that Zero GC is doing so well after all. However, this is not so interesting because I assume that replacing calloc with a simple bump a pointer allocation would improve performance noticeably.

What is interesting is the memory usage over time. As you can see in the chart below, after a minute of such test, the process using Zero GC takes around 1 GB of memory. This is… quite a lot. Not sure yet how to interpret this. Version with regular GC ended with a stable 120 MB size. Both started from fresh run.

zerogcaspnetcore

This would mean that each REST WebApi requests triggers around 55 kB of allocations. Any comments will be appreciated here…

Update 30.01.2018: After debugging allocations during single ASP.NET requests, most of them comes from RouterMiddleware. This is no surprise as currently this application does almost nothing but routing… I’ve uploaded sample log of such single request which seems to be minimal (others are allocating some buffers from time to time). It consumes around 7 kB of memory.

ThreeDotNetos

I would like to announce with pleasure the initiative of Three Dot Netos. I am very excited because the preparations have been going on for several months. And here it is finally. I can officially and publicly announce it!

But… what?

We get in the car and start on the road through Poland. 5 cities, day by day. Every evening a different city and other people but the same topics – .NET performance, .NET internals and other advanced .NET themes. Hell of a ride for your brain! There will be no mercy. If you’re bored with sessions on .NET at other conferences, now you should be happy! Of course, this is not meant to be an empty talk. All topics discussed will be practical. But we will not repeat again the same boring “reference types are on the heap and value types are on the stack“. Oh no no! Detailed agenda will be announced in a few weeks. But be sure it will be interesting.

In the first edition, we will speak Polish. But who knows what the future will bring. However, if you know some Polish guys – tell them about us! Spread the word, this will always be helpful for planning further initiatives. Please 🙂

Continue reading

Tune screenshot

I would like to present you a new tool I’ve started to work on recently. I’ve called it The Ultimate .NET Experiment (Tune) as its purpose is to learn .NET internals and performance tuning by experiments with C# code. As it is currently in very early 0.2 version, it can be treated as Proof Of Concept with many, many features still missing. But it is usable enough to have some fun with it already.

The main way of working with this tool is as follows:

  • write a sample, valid C# script which contains at least one class with public method taking a single string parameter. It will be executed by hitting Run button. This script can contain as many additional methods and classes as you wish. Just remember that first public method from the first public class will be executed (with single parameter taken from the input box below the script). You may also choose whether you want to build in Debug or Release mode (note: currently it is only x64 bit compilation).
  • after clicking Run button, the script will be compiled and executed. Additionally, it will be decompiled both to IL (Intermediate Language) and assembly code in the corresponding tab.
  • all the time Tune is running (including time during script execution) a graph with GC data is being drawn. It shows information about generation sizes and GC occurrences (as vertical lines with the number below showing which generation has been triggered).

Continue reading

Zero Garbage Collector on .NET Core

Starting from .NET Core 2.0 coupling between Garbage Collector and the Execution Engine itself have been loosened. Prior to this version, the Garbage Collector code was pretty much tangled with the rest of the CoreCLR code. However, Local GC initiative in version 2.0 is already mature enough to start using it. The purpose of the exercise we are going to do is to prepare Zero Garbage Collector that replaces the default one.

Zero Garbage Collector is the simplest possible implementation that in fact does almost nothing. It only allows you to allocate objects, because this is obviously required by the Execution Engine. Created objects are never automatically deleted and theoretically, no longer needed memory is never reclaimed. Why one would be interested in such a simple GC implementation? There are at least two reasons:Continue reading