Starting from .NET Core 2.0 coupling between Garbage Collector and the Execution Engine itself have been loosened. Prior to this version, the Garbage Collector code was pretty much tangled with the rest of the CoreCLR code. However, Local GC initiative in version 2.0 is already mature enough to start using it. The purpose of the exercise we are going to do is to prepare Zero Garbage Collector that replaces the default one.
Zero Garbage Collector is the simplest possible implementation that in fact does almost nothing. It only allows you to allocate objects, because this is obviously required by the Execution Engine. Created objects are never automatically deleted and theoretically, no longer needed memory is never reclaimed. Why one would be interested in such a simple GC implementation? There are at least two reasons:
- it is an excellent basis for the development of your own Garbage Collection mechanism. It provides the necessary functionality to make runtime work properly and you can build on top of that.
- it may be interesting for special use cases like very short living applications or such that almost no allocate memory (you can come up with those concepts as No-alloc or Zero-alloc programming). In such case providing GC overhead is unnecessary and it may be wise to get rid of it. It is like making huge
GC.TryStartNoGCRegion
over all you application.
Note: All I’ve done here is currently Windows-related so some steps should be tailored accordingly to Linux environment. I assume also the newest .NET Core 2.0 SDK (not preview) version is installed.
Note: You can clone my GitHub repository and compile CoreCLR.ZeroGC project for reference. All following code samples come from this project.
Building custom CoreCLR
The default CoreCLR build does not have customizable GC enabled (yet?) so we have to compile our custom version with FEATURE_STANDALONE_GC
feature enabled. If you are not sure how to begin with CoreCLR compilation, please refer to the documentation or my own earlier post. A custom compilation is easy because the suitable option has been already introduced to build.cmd
. I assume also Debug build and skipping all tests (for our experimental usage they just take too long):
1 |
> build.cmd -Debug -buildstandalonegc -skiptests |
In case of -buildstandalonegc
option there are two additional FEATURE_STANDALONE_GC
and FEATURE_STANDALONE_GC_ONLY
preprocessor defines enabled.
After successful build you should have your brand new custom CoreCLR located at .\bin\Product\Windows_NT.x64.Debug
folder. Although we could use standard tooling and integrate it with classic dotnet run
command, it is not necessary here (and currently a little bit unstable anyway). We can use simpler CLR host called CoreRun.exe
located in the mentioned folder (and I will refer to this folder as CLR_DIR
).
CoreCLR with standalone Garbage Collector
How custom GC is handled in CoreCLR? Everything starts at .\src\vm\ceemain.cpp:EEStartupHelper
method which initializes A LOT of things. Among others, there is .\src\vm\ceemain.cpp:InitializeGarbageCollector
method call:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
void InitializeGarbageCollector() { HRESULT hr; ... #ifdef FEATURE_STANDALONE_GC if (CLRConfig::GetConfigValue(CLRConfig::EXTERNAL_GCUseStandalone) #ifdef FEATURE_STANDALONE_GC_ONLY || true #endif // FEATURE_STANDALONE_GC_ONLY ) { LoadGarbageCollector(); } else #endif // FEATURE_STANDALONE_GC { LoadStaticGarbageCollector(); } ... } |
In default case LoadStaticGarbageCollector
function is called which fall backs to loading default GC implementation from the runtime itself. But because FEATURE_STANDALONE_GC_ONLY
is defined, LoadGarbageCollector
is called in our case:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
#ifdef FEATURE_STANDALONE_GC void LoadGarbageCollector() { ... TCHAR *standaloneGc = nullptr; CLRConfig::GetConfigValue(CLRConfig::EXTERNAL_GCStandaloneLocation, &standaloneGc); HMODULE hMod; ... hMod = CLRLoadLibrary(standaloneGc); InitializeGarbageCollectorFunction igcf = (InitializeGarbageCollectorFunction)GetProcAddress(hMod, INITIALIZE_GC_FUNCTION_NAME); // at this point we are committing to using the standalone GC // given to us. IGCToCLR* gcToClr = new (nothrow) standalone::GCToEEInterface(); IGCHandleManager *pGcHandleManager; IGCHeap *pGCHeap; if (!igcf(gcToClr, &pGCHeap, &pGcHandleManager, &g_gc_dac_vars)) { ThrowOutOfMemory(); } ... g_pGCHeap = pGCHeap; g_pGCHandleManager = pGcHandleManager; g_gcDacGlobals = &g_gc_dac_vars; } #endif |
As we can see, the procedure CoreCLR takes is pretty straightforward (for clarity I omitted in the code any conditions and checks):
- Get the location of the standalone GC from GCStandalone setting. It can be set both by registry or environment variable in an usual CoreCLR way. We will use
COMPlus_GCStandaloneLocation
environment variable pointing to our DLL. - Load the specified library and find the function
INITIALIZE_GC_FUNCTION_NAME
from it (which is “InitializeGarbageCollector”) with signature:
123456789// The function that initialzes the garbage collector.// Should only be called once: here, during EE startup.// Returns true if the initialization was successful, false otherwise.typedef bool (*InitializeGarbageCollectorFunction)(/* In */ IGCToCLR*,/* Out */ IGCHeap**,/* Out */ IGCHandleManager**,/* Out */ GcDacVars*); - Call
InitializeGarbageCollector
function from the specified library, providing itIGCToCLR
interface and obtainingIGCHeap
,IGCHandleManager
andGcDacVars
pointers. - From now EE will cooperate with our custom GC via mentioned interfaces and no default GC code will be used!
Sounds very interesting and easy! From the user perspective it means we only need to define one environment variable and CoreCLR will replace default GC implementation. Great work has been done to separate the GC from the engine so well.
Standalone Garbage Collector library
Creating a DLL containing our own GC is fairly straightforward, however the separation is not strong yet enough to make sure there are no problems at all. One of the problems is to include the appropriate set of header files. It has become impossible to include them unchanged. This entailed joining the next and subsequent files, until eventually you would have to include almost all CoreCLR sources. Eventually, I’ve just created my own header files, containing only the necessary definitions taken from the CoreCLR code.
We start creating our library from defining required InitializeGarbageCollector
as an exported pure C function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
extern "C" DLLEXPORT bool InitializeGarbageCollector( /* In */ IGCToCLR* clrToGC, /* Out */ IGCHeap** gcHeap, /* Out */ IGCHandleManager** gcHandleManager, /* Out */ GcDacVars* gcDacVars ) { IGCHeap* heap = new CustomGCHeap(clrToGC); IGCHandleManager* handleManager = new CustomGCHandleManager(); *gcHeap = heap; *gcHandleManager = handleManager; return true; } |
In our simplified case its only responsibility is to return custom interfaces implementations. We should also store somehow the pointer to the IGCToCLR
interface allowing us to cooperate with the runtime. Now let’s look at each of these interfaces closely.
IGCToCLR interface
This interface passed as an argument to the function InitializeGarbageCollector
is used to communicate with the runtime. It contains quite a lot of available methods and listing them all here is pointless. Let’s just look at some of the most interesting:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
class IGCToCLR { public: // Suspends the EE for the given reason. virtual void SuspendEE(SUSPEND_REASON reason) = 0; // Resumes all paused threads, with a boolean indicating // if the EE is being restarted because a GC is complete. virtual void RestartEE(bool bFinishedGC) = 0; // Callback from the GC informing the EE that a GC has completed. virtual void GcDone(int condemned) = 0; // Callback from the GC informing the EE that it is preparing to start working. virtual void GcStartWork(int condemned, int max_gen) = 0; // Performs a stack walk of all managed threads and invokes the given promote_func // on all GC roots encountered on the stack. Depending on the condemned generation, // this function may also enumerate all static GC refs if necessary. virtual void GcScanRoots(promote_func* fn, int condemned, int max_gen, ScanContext* sc) = 0; // ... } |
With the help of those methods we can create sophisticated GC implementations. However in case of our Zero Garbage Collection, calling only one method will be required as we will see later.
IGCHeap interface
This is the main interface representing core Garbage Collection functionality. Implementing IGCHeap requires implementing 71 methods! It can hardly be called a loose coupling. Among others the most important methods are for allocations (Alloc
, AllocLHeap
) and initialization (Initialize
):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
class CustomGCHeap : public IGCHeap { private: IGCToCLR* gcToCLR; public: CustomGCHeap(IGCToCLR* gcToCLR) { this->gcToCLR = gcToCLR; } // Inherited via IGCHeap virtual bool IsValidSegmentSize(size_t size) override; virtual bool IsValidGen0MaxSize(size_t size) override; virtual size_t GetValidSegmentSize(bool large_seg = false) override; virtual void SetReservedVMLimit(size_t vmlimit) override; virtual void WaitUntilConcurrentGCComplete() override; virtual bool IsConcurrentGCInProgress() override; virtual void TemporaryEnableConcurrentGC() override; virtual void TemporaryDisableConcurrentGC() override; virtual bool IsConcurrentGCEnabled() override; virtual HRESULT WaitUntilConcurrentGCCompleteAsync(int millisecondsTimeout) override; virtual bool FinalizeAppDomain(AppDomain * pDomain, bool fRunFinalizers) override; virtual void SetFinalizeQueueForShutdown(bool fHasLock) override; virtual size_t GetNumberOfFinalizable() override; virtual bool ShouldRestartFinalizerWatchDog() override; virtual Object * GetNextFinalizable() override; virtual void SetFinalizeRunOnShutdown(bool value) override; virtual int GetGcLatencyMode() override; virtual int SetGcLatencyMode(int newLatencyMode) override; virtual int GetLOHCompactionMode() override; virtual void SetLOHCompactionMode(int newLOHCompactionMode) override; virtual bool RegisterForFullGCNotification(uint32_t gen2Percentage, uint32_t lohPercentage) override; virtual bool CancelFullGCNotification() override; virtual int WaitForFullGCApproach(int millisecondsTimeout) override; virtual int WaitForFullGCComplete(int millisecondsTimeout) override; virtual unsigned WhichGeneration(Object * obj) override; virtual int CollectionCount(int generation, int get_bgc_fgc_coutn = 0) override; virtual int StartNoGCRegion(uint64_t totalSize, bool lohSizeKnown, uint64_t lohSize, bool disallowFullBlockingGC) override; virtual int EndNoGCRegion() override; virtual size_t GetTotalBytesInUse() override; virtual HRESULT GarbageCollect(int generation = -1, bool low_memory_p = false, int mode = collection_blocking) override; virtual unsigned GetMaxGeneration() override; virtual void SetFinalizationRun(Object * obj) override; virtual bool RegisterForFinalization(int gen, Object * obj) override; virtual HRESULT Initialize() override; virtual bool IsPromoted(Object * object) override; virtual bool IsHeapPointer(void * object, bool small_heap_only = false) override; virtual unsigned GetCondemnedGeneration() override; virtual bool IsGCInProgressHelper(bool bConsiderGCStart = false) override; virtual unsigned GetGcCount() override; virtual bool IsThreadUsingAllocationContextHeap(gc_alloc_context * acontext, int thread_number) override; virtual bool IsEphemeral(Object * object) override; virtual uint32_t WaitUntilGCComplete(bool bConsiderGCStart = false) override; virtual void FixAllocContext(gc_alloc_context * acontext, bool lockp, void * arg, void * heap) override; virtual size_t GetCurrentObjSize() override; virtual void SetGCInProgress(bool fInProgress) override; virtual bool RuntimeStructuresValid() override; virtual size_t GetLastGCStartTime(int generation) override; virtual size_t GetLastGCDuration(int generation) override; virtual size_t GetNow() override; virtual Object * Alloc(gc_alloc_context * acontext, size_t size, uint32_t flags) override; virtual Object * AllocLHeap(size_t size, uint32_t flags) override; virtual Object * AllocAlign8(gc_alloc_context * acontext, size_t size, uint32_t flags) override; virtual void PublishObject(uint8_t * obj) override; virtual void SetWaitForGCEvent() override; virtual void ResetWaitForGCEvent() override; virtual bool IsObjectInFixedHeap(Object * pObj) override; virtual void ValidateObjectMember(Object * obj) override; virtual Object * NextObj(Object * object) override; virtual Object * GetContainingObject(void * pInteriorPtr, bool fCollectedGenOnly) override; virtual void DiagWalkObject(Object * obj, walk_fn fn, void * context) override; virtual void DiagWalkHeap(walk_fn fn, void * context, int gen_number, bool walk_large_object_heap_p) override; virtual void DiagWalkSurvivorsWithType(void * gc_context, record_surv_fn fn, void * diag_context, walk_surv_type type) override; virtual void DiagWalkFinalizeQueue(void * gc_context, fq_walk_fn fn) override; virtual void DiagScanFinalizeQueue(fq_scan_fn fn, ScanContext * context) override; virtual void DiagScanHandles(handle_scan_fn fn, int gen_number, ScanContext * context) override; virtual void DiagScanDependentHandles(handle_scan_fn fn, int gen_number, ScanContext * context) override; virtual void DiagDescrGenerations(gen_walk_fn fn, void * context) override; virtual void DiagTraceGCSegments() override; virtual bool StressHeap(gc_alloc_context * acontext) override; virtual segment_handle RegisterFrozenSegment(segment_info * pseginfo) override; virtual void UnregisterFrozenSegment(segment_handle seg) override; }; |
By looking at these methods, we see quite a lot of implementation details of the default .NET GCs, which in theory should not be visible here. There are concepts of generations, SOH and LOH, segments or concurrent GC visible. Does it mean we have to re-implement default GC? Not at all. It turns out that most of this methods may provide just dummy implementation, like:
1 2 3 4 |
bool CustomGCHeap::RuntimeStructuresValid() { return true; } |
An important function is the initialization method, which should return NOERROR:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
HRESULT CustomGCHeap::Initialize() { WriteBarrierParameters args = {}; args.operation = WriteBarrierOp::Initialize; args.is_runtime_suspended = true; args.requires_upper_bounds_check = false; args.card_table = new uint32_t[1]; args.lowest_address = reinterpret_cast<uint8_t*>(~0);; args.highest_address = reinterpret_cast<uint8_t*>(1); args.ephemeral_low = reinterpret_cast<uint8_t*>(~0); args.ephemeral_high = reinterpret_cast<uint8_t*>(1); gcToCLR->StompWriteBarrier(&args); return NOERROR; } |
Initialize method should also configure write barriers required by the GC. In case of Zero Garbage Collector no write barriers are needed so we would just want to do nothing. We are doing some magic here to fool the runtime about the managed heap boundaries (by setting ephemeral_low
and ephemeral_high
). This allows you to bypass the write barrier code because the JITted code ignores the ephemeral segment. It can be seen in CoreCLR sources:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
ifdef _DEBUG ; Version for when we're sure to be in the GC, checks whether or not the card ; needs to be updated ; ; void JIT_WriteBarrier_Debug(Object** dst, Object* src) LEAF_ENTRY JIT_WriteBarrier_Debug, _TEXT ... CheckCardTable: ; See if we can just quick out cmp rax, [g_ephemeral_low] jb Exit cmp rax, [g_ephemeral_high] jnb Exit ; Check if we need to update the card table ; Calc pCardByte shr rcx, 0Bh add rcx, [g_card_table] ; Check if this card is dirty cmp byte ptr [rcx], 0FFh jne UpdateCardTable REPRET UpdateCardTable: mov byte ptr [rcx], 0FFh ret align 16 Exit: REPRET LEAF_END_MARKED JIT_WriteBarrier_Debug, _TEXT endif |
By the way, this shows that some coupling between runtime and GC still exists and we have to use such tricks to overcome it.
Other two important methods are Alloc
for allocations on Small Object Heap and AllocLHeap
for allocations on Large Object Heap. As our Zero GC does not use those concepts, we are just providing the same implementation for both of them:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Object* CustomGCHeap::Alloc(gc_alloc_context * acontext, size_t size, uint32_t flags) { int sizeWithHeader = size + sizeof(ObjHeader); ObjHeader* address = (ObjHeader*)calloc(sizeWithHeader, sizeof(char*)); return (Object*)(address + 1); } Object* CustomGCHeap::AllocLHeap(size_t size, uint32_t flags) { int sizeWithHeader = size + sizeof(ObjHeader); ObjHeader* address = (ObjHeader*)calloc(sizeWithHeader, sizeof(char*)); return (Object*)(address + 1); } |
This is the simplest implementation I’ve managed to imagine. It uses the standard calloc
function, zeroing the allocated memory which is required by the runtime. It allocates size- bytes for the object itself and additional space for the header, defined as:
1 2 3 4 5 6 7 8 |
class ObjHeader { private: #ifdef _WIN64 DWORD m_alignpad; #endif // _WIN64 DWORD m_SyncBlockValue; }; |
This is, in fact, all we need to have custom heap working! From now on, all allocations will be made using the standard calloc
function. In the true GC, the alloc
method would check for memory conditions (space shortage in generations or any other condition that we invent) and invoke garbage collection if needed. This is what makes near-40k lines of code in the original gc.cpp. We have about 400 lines:) Obviously, there is no garbage collection at all and our GarbageCollect
(which may be called because of OS or by your application implicitly) does nothing either:
1 2 3 4 |
HRESULT CustomGCHeap::GarbageCollect(int generation, bool low_memory_p, int mode) { return NOERROR; } |
IGCHandleManager interface
IGCHandleManager
is a second important interface required when implementing custom GC. It is representing handle manager functionality. Handles are extensively used by the runtime internally so this implementation has to do the minimal viable functionality instead of just dummy implementations, even if our application code does not use handles at all. The interface is again quite wide:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
class CustomGCHandleManager : public IGCHandleManager { // Inherited via IGCHandleManager virtual bool Initialize() override; virtual void Shutdown() override; virtual void * GetHandleContext(OBJECTHANDLE handle) override; virtual IGCHandleStore * GetGlobalHandleStore() override; virtual IGCHandleStore * CreateHandleStore(void * context) override; virtual void DestroyHandleStore(IGCHandleStore * store) override; virtual OBJECTHANDLE CreateGlobalHandleOfType(Object * object, HandleType type) override; virtual OBJECTHANDLE CreateDuplicateHandle(OBJECTHANDLE handle) override; virtual void DestroyHandleOfType(OBJECTHANDLE handle, HandleType type) override; virtual void DestroyHandleOfUnknownType(OBJECTHANDLE handle) override; virtual void SetExtraInfoForHandle(OBJECTHANDLE handle, HandleType type, void * pExtraInfo) override; virtual void * GetExtraInfoFromHandle(OBJECTHANDLE handle) override; virtual void StoreObjectInHandle(OBJECTHANDLE handle, Object * object) override; virtual bool StoreObjectInHandleIfNull(OBJECTHANDLE handle, Object * object) override; virtual void SetDependentHandleSecondary(OBJECTHANDLE handle, Object * object) override; virtual Object * GetDependentHandleSecondary(OBJECTHANDLE handle) override; virtual Object * InterlockedCompareExchangeObjectInHandle(OBJECTHANDLE handle, Object * object, Object * comparandObject) override; virtual HandleType HandleFetchType(OBJECTHANDLE handle) override; virtual void TraceRefCountedHandles(HANDLESCANPROC callback, uintptr_t param1, uintptr_t param2) override; }; |
The most important implementations are those related to the underlying IGCHandleStore
:
1 2 3 4 5 6 7 8 9 10 11 12 |
CustomGCHandleStore* g_gcGlobalHandleStore; bool CustomGCHandleManager::Initialize() { g_gcGlobalHandleStore = new CustomGCHandleStore(); return true; } IGCHandleStore * CustomGCHandleManager::GetGlobalHandleStore() { return g_gcGlobalHandleStore; } |
As handle is just a pointer-size memory block storing an address to an object, the simplest implementations of handle manipulation are trivial:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
void CustomGCHandleManager::StoreObjectInHandle(OBJECTHANDLE handle, Object * object) { Object** handleObj = (Object**)handle; *handleObj = object; } bool CustomGCHandleManager::StoreObjectInHandleIfNull(OBJECTHANDLE handle, Object * object) { // TODO: This is not thread-safe Object** handleObj = (Object**)handle; if (*handleObj == NULL) { *handleObj = object; return true; } return false; } Object* CustomGCHandleManager::InterlockedCompareExchangeObjectInHandle(OBJECTHANDLE handle, Object * object, Object * oldObject) { // TODO: This is not thread-safe Object** handleObject = (Object**)handle; if (*handleObject == oldObject) { *handleObject = object; } return *handleObject; } |
There is also one very simplified method that returns context of the handle in terms of AppDomain index. I’ve take the easy way out here and I just return the first AppDomain index which obviously may be not enough for more sophisticated applications:
1 2 3 4 |
void * CustomGCHandleManager::GetHandleContext(OBJECTHANDLE handle) { return (void*)1; } |
My custom handle storage is also oversimplified (since it will cause a entire runtime crash if we create more than 2^16 handles) and obviously requires re-implementing but it just works for Proof Of Concept purposes:
1 2 3 4 5 6 7 8 |
int handlesCount = 0; OBJECTHANDLE handles[65535]; OBJECTHANDLE CustomGCHandleStore::CreateHandleOfType(Object * object, HandleType type) { handles[handlesCount] = (OBJECTHANDLE__*)object; return (OBJECTHANDLE)&handles[handlesCount++]; } |
This is in fact all we need to have our custom Garbage Collector working. Obviously there are some parts missing (especially real handles management) but as PoC it works perfectly!
Running custom CoreCLR with custom GC
How can we put it all together? We should have at this moment already compiled our custom CoreCLR and ZeroGC library. As a sample application, I suggest using just a new sample .NET Core 2.0 console application created by dotnet new console command. Then from our application directory, we have to set three environment variables and just run our application:
1 2 3 4 |
> set CORE_LIBRARIES=c:\Program Files\dotnet\shared\Microsoft.NETCore.App\2.0.0 > set COMPlus_GCStandaloneLocation=f:\GithubProjects\CoreCLR.ZeroGC\bin\x64\Debug\ZeroGC.dll > set COMPlus_HeapVerify=16 > <CLR_DIR>\CoreRun.exe bin\Debug\netcoreapp2.0\CoreCLR.HelloWorld.dll |
CORE_LIBRARIES
variable is required by the CoreRun itself to point to .NET Core folder. COMPlus_GCStandaloneLocation
was mentioned before and it tells the CLR to use our custom GC. Unfortunately for Debug version of runtime currently we also need to set COMPlus_HeapVerify=16
(which sets HEAPVERIFY_NO_RANGE_CHECKS
(0x10) for HeapVerifyLevel
) as it excludes checking if an OBJECTREF
is within the bounds of the managed heap. There is no managed heap in fact so those checks makes no sense.
And that’s all! We have just run the very first .NET Core program with our Zero Garbage Collector enabled.
Note. Due to loose coupling between EE and GC, you can also go opposite way! You can re-use automatic memory management code from CoreCLR in a stand-alone application, without whole .NET runtime. There is even separate GCSample project included in CoreCLR code doing exactly that.