Imagine we have a simple class, a wrapper around some array of structs (better data locality etc.):

Now, I would like to have an efficient access to every element. Obviously, a trivial indexer would be inefficient here, as it would return a copy of the given array element (a struct):

Luckily, since C# 7.0 we can “ref return” to efficiently return a reference to a given array element which is super nice (refer to my article about ref for more info):

Here, 99.9999% of devs will stop and will be satisfied with the semantics and performance results. But… if we know we will call it tremendously often, can we do better?!

First of all, let’s see what is being JITted by the .NET Core x64 runtime (5.0rc) when accessing 9th element (index is 8):

To those who know assembler a little, it may be clear what is going on here. But let’s make a short summary:

  • we see a little of “stack frame” creation here (sub/add rsp) – could we get rid of it in such a simple method?
  • we see bound check in line 4 (cmp the index to 8) to check if we are accessing an array with a correct index – could we get rid of it because we trust our code? 😇

Disclaimer: Getting rid of bound checks is very risky and the resulting dangers probably will overcome the performance benefits. Thus, use it only after heavy consideration, if you are sure why you need it and you can ensure caller’s code will be correct (providing valid indices).

To continue, we will be walking on thin ice of unsafe code now.

The first idea is to use Unsafe.Add to provide kind of “pointer arithmetic” – add an index-element to the first element:

The “problem” here is, it produces almost identical results because _array[0] is still a bound-checked array access (and we don’t get rid of stack frame too):

Hence, the non trivial question arises – how to get the address/ref to the first element of an array?

We could think of doing some Span-based magic (to use MemoryMarshal.GetReference):

But you can probably feel it – it produces even slower and bigger code (Span creation handling etc.) while still bound check will be there (Span is “safe”).

So, we need somehow to find a better way of getting an address of the first array’s element. The thing is, the internal structure of the array type is an implementation detail (although well-known). How can we overcome that?

The idea is… to rely on that implementation detail. This approach is being used by DangerousGetReferenceAt method from Microsoft.Toolkit.HighPerformance package maintained by Sergio Pedri. DangerousGetReferenceAt source code explains it well:

So, we are casting (reinterpreting) an array reference as a reference to some artificial RawArrayData class, which has a layout corresponding to an array layout. Thus, getting “data” reference is now just trivial. No bound checks at all!

The good news is this method has been ported to .NET 5! So, in .NET 5.0rc we can already use MemoryMarshal.GetArrayDataReference which does exactly the same thing:

Thus, without any external dependencies our code in .NET 5 may be rewritten to:

And the resulting code is indeed much more lightweight:

No bound-checks, and as an additional reward from the method simplicity – no stack frame.

Benchmarks are indeed showing a noticeable (well, in ns order of magnitude) difference:

Which simply means, we are now about 5x faster than with the initial solution!

Disclaimer #2: Approach taken here with the usage of GetArrayDataReferece is super dangerous. As Levi Broderick, one of .NET framework developers, said: “Also, read the method documentation. It does more than remove bounds checks; it also removes array variance checks. So it might not be valid to write to the ref, even if the index is within bounds. Misuse of the method will bite you in the ass, guaranteed.”  Moreover, documentation clearly states that “a reference may be used for pinning but must never be dereferenced” 😇

Sync over async in .NET is always bad and there is no better advice than just to avoid it. What does “Sync over async” mean exactly? It happens if you synchronously wait on an asynchronous operation result with the help of .Result, .Wait or similar. Why is it bad? First of all, it blocks (wastes) one thread to wait on a result – which may lead to threads starvation. But even worse, it may deadlock your operation and (sometimes) the whole application.

Probably you’ve heard all that previously. I just wanted to present a picture, “worth a thousand words“, to explain why does it happen.

synchronizationcontext_winforms_nestedsyncThere is a concept of SynchronizationContext in .NET – an abstraction that knows how/where schedule a work item (like an async/await continuation). When you await something, SynchronizationContext is being captured. And when continuation is going to be run – we use SynchronizationContext to run the continuation “somewhere”. SynchronizationContext implementation may be different in various scenarios (console, UI, web, mobile applications), because there are various needs to “synchronize” work items. The main example is a GUI-based application. When we start an asynchronous operation on the UI thread, we expect its continuations will “return” to the same thread.

But, if we .Result that operation, the main UI thread is blocked waiting on the result, so it is not able to process anything (including mouse/keyboard events). So there is no way continuation (that would set the result) may run, thus we endlessly wait for the result – deadlock.

synchronizationcontext_winforms_configureawait

That’s why ConfigureAwait helps – it allows to say “I don’t care about scheduling continuation to the original (captured) context“. Thanks to that asynchronous operation continuation is scheduled to a different thread (thread pool’s) and sets the result with no problem. This resumes the main UI thread, and there is no deadlock.

That was just two simple drawings. If you’d like to know more, refer to a great ConfigureAwait FAQ by Stephen Toub.

Again, all this is just a work for a much bigger project, which is awesome Async Expert on-line course about asynchronous and concurrent programming in .NET. If you found it interesting, stay tuned by subscribing to the newsletter on the above-mentioned page!

It is said that picture is worth a thousand words, and I agree. That’s why I like preparing technical drawings to explain various concepts. So, here it is – a short story of how async/await works in .NET.

thereisnothread

The main power behind async/await is that while we “await” on an ongoing I/O operation, the calling thread may be released for doing other work. And this provides a great thread re-usability. Thus, better scalability – much smaller number of threads is able to handle the same amount of operations comparing to asynchronous/waiting approach.

The main role here plays so-called overlapped I/O (in case of Windows) which allows to asynchronously delegate the I/O operation to the operating system, and only after completion the provided callback will notify us about the result. The main workforce here is so-called I/O completion port (IOCP).Continue reading

poh01

In the upcoming .NET 5 a very interesting change is added to the GC – a dedicated Pinned Object Heap, a very new type of the managed heap segment (as we have Small and Large Object Heaps so far). Pinning has its own costs, because it introduces fragmentation (and in general complicates object compaction a lot). We are used to have some good practices about it, like “pin only for…:

  • a very short time” so, the GC will not bother – to reduce probability that the GC happens while many objects were pinned. That’s a scenario to use fixed keyword, which is in fact only a very lightweight way of flagging particular local variable as a pinned reference. As long as GC does not happen, there is no additional overhead.
  • a very long time”, so the GC will promote those objects to generation 2 – as gen2 GCs should be not so common, the impact will be minimized also. That’s a scenario to use GCHandle of type Pinned, which is a little bigger overhead because we need to allocate/free handle.

However, even if applied, those rules will produce some fragmentation, depending how much you pin, for how long, what’s the resulting layout of the pinned objects in memory and many other, intermittent conditions.

So, in the end, it would be perfect just to get rid of pinned objects and move them to a different place than SOH/LOH. This separate place would be simply ignored, by the GC design, when considering heap compaction so we will get pinning behaviour out of the box.Continue reading

Mobius Overview

.NET application is “just” a piece of CIL bytecode to be executed by the .NET runtime. And .NET runtime is “just” a program that is able to perform this task. It happens that currently .NET Framework/.NET Core runtimes are written in C++. I am also fully aware of CoreRT that was .NET runtime with many parts rewritten to C# (like type system) but still, crucial parts (including JIT compiler and the GC) were left written in C++.

But what if we write .NET runtime as… .NET application? Is is possible at all? I mean, literally no native/C++ code, everything running as .NET Core application written in C#? Does this sound like kind of inception and infinite recursion? It would require running one .NET runtime on the top of another .NET runtime, right?

I decided to check it out and that’s how Mobius runtime idea has been coined! Yeah, I know it sound strange and I do not expect it will be anything close to production ready thingy in the nearest century. I am fully aware of the amount of code needed to be written to make full .NET runtime. However, I found it interesting to validate such idea and I find it small usages as well. Imagine a NuGet package with the separate runtime that you can add to your application 😉

Continue reading

bothgamesbanner_white

2 Developers from Poland join forces to publish their IT-related card games: from Devs to Devs – OutOfMemory and IT Startup!

Each of us already has a published book and now we want to share some knowledge with fun games! Remember all the original 151 Pokemon names? How about playing a game that lets you remember something useful: like what technologies are beneficial to learn to be a better Developer! One of the games is already published in Poland and sold over 3k copies (so we how already some publishing know how). Now we want to start a company and publish 2 of our IT related card games worldwide in English.

Have fun while playing our print and play prototypes!

We plan to publish the games as one company on Kicktarter in Q1 2020.

Continue reading

prototype07

So…after quite a serious thing which was writing Pro .NET Memory Management book, I’ve decided to experiment with a little pet project for having some more fun. I have quite a few very interesting ideas going on in my head. Yet, I needed to choose one!

And that’s how an idea of OutOfMemory game prototype materialized! Ladies and gentleman, please meet the first in the world card game about .NET-based high performance and memory-aware programming. Sounds so nerdy, doesn’t it?! That’s by design and I like it! From developer to developers, with love 😉 It contains a huge amount of stuff related to programming, hardware architecture and… GC obviously! This is going to be a physical card game, not a computer or a mobile one.

The goal of the game is to build an application. You build it by playing on the table so-called Feature cards – the first player getting a given amount of Feature points wins!

prototype01

But each Feature has its cost! Each Feature card allocates Memory and consumes CPU Ticks. At the end of each player’s turn, you add an appropriate amount of Memory. If you hit a given limit – OutOfMemory occurs! CPU Ticks play also an important role in limiting possible actions in a turn.

In each turn, the player takes a single card from a deck. Or two, if she has an appropriate special card which is… Adam Sitnik‘s card (an example of a Hero card). Another example of .NET Hero card is Ben Adams – very powerful as it both reduces Memory and CPU Ticks, has a special ability (removes all the nasty Issues played against you by your opponents) and even adds a single Feature point (having Ben contributing to your app is a feature by itself)!

prototype02b

Guess who is on the other Hero cards?! Currently, I plan eight such cards!

Card(s) taken from the deck can be used immediately or kept at hand (but there is a limit of cards in hand, reflecting CPU cache my dear!)

There are various other types of cards. For example, there are Garbage Collection Action cards so if you are lucky enough (and plan to keep them in advance) you can clean your Memory periodically.

prototype03

A lot of cards help you to keep Features while reducing their Memory and/or Ticks costs, like Span<T> or Lock-free programming

prototype04

There are also nasty Bug and Issue cards that you can play against your opponents, removing their Feature points or making their app more allocating and slower!

prototype05b

Additionally, there are various Action cards that can be played to receive a short, single-turn benefit. Some influence only you, some a given opponent, and some all the players – like Black Friday cards that makes all Features double-allocating in the next turn (due to high volume traffic!).

prototype06

All this is in a very early prototype stage, requiring possibly quite a lot of rethinking. And a lot of balancing is required to create playable and enjoyable deck. Currently, my prototype consists of 80 cards.

prototype07

I am playing this game a lot with… myself, to balance the very first prototype. When ready, I plan to publish self-printable very rough version and I hope there will be .NET developers out there willing to try it out! With your feedback, we may create an amazing game!

Nevertheless, I would LOVE to start receiving your feedback RIGHT away! Do you like this idea? Do you have ideas of Hero, Bug, Issue, Fix and Feature cards?


If you are want to be informed about further work and the prototype, feel invited to subscribe at a dedicated page!


Note also that this initiative is a part of a bigger Dotnetos initiative – although currently only I am involved in the design of this game, most probably sooner or later all three Dotnetos will be somehow involved in it – I hope so!

Note. All graphics on cards and the cards itself are prototypes and do not reflect the final quality. Moreover, all cliparts were taken from Free Vectors via vecteezy.com.

ThreeDotNetos

I would like to announce with pleasure the initiative of Three Dot Netos. I am very excited because the preparations have been going on for several months. And here it is finally. I can officially and publicly announce it!

But… what?

We get in the car and start on the road through Poland. 5 cities, day by day. Every evening a different city and other people but the same topics – .NET performance, .NET internals and other advanced .NET themes. Hell of a ride for your brain! There will be no mercy. If you’re bored with sessions on .NET at other conferences, now you should be happy! Of course, this is not meant to be an empty talk. All topics discussed will be practical. But we will not repeat again the same boring “reference types are on the heap and value types are on the stack“. Oh no no! Detailed agenda will be announced in a few weeks. But be sure it will be interesting.

In the first edition, we will speak Polish. But who knows what the future will bring. However, if you know some Polish guys – tell them about us! Spread the word, this will always be helpful for planning further initiatives. Please 🙂

Continue reading

Tune screenshot

I would like to present you a new tool I’ve started to work on recently. I’ve called it The Ultimate .NET Experiment (Tune) as its purpose is to learn .NET internals and performance tuning by experiments with C# code. As it is currently in very early 0.2 version, it can be treated as Proof Of Concept with many, many features still missing. But it is usable enough to have some fun with it already.

The main way of working with this tool is as follows:

  • write a sample, valid C# script which contains at least one class with public method taking a single string parameter. It will be executed by hitting Run button. This script can contain as many additional methods and classes as you wish. Just remember that first public method from the first public class will be executed (with single parameter taken from the input box below the script). You may also choose whether you want to build in Debug or Release mode (note: currently it is only x64 bit compilation).
  • after clicking Run button, the script will be compiled and executed. Additionally, it will be decompiled both to IL (Intermediate Language) and assembly code in the corresponding tab.
  • all the time Tune is running (including time during script execution) a graph with GC data is being drawn. It shows information about generation sizes and GC occurrences (as vertical lines with the number below showing which generation has been triggered).

Continue reading