Note: This is a first entry of a new series about Microsoft .NET CLR internals. I encourage you to ask – the most interesting questions will become a similar posts in the future! This one was inspired by Angelika Piątkowska.
How does Object.GetType() really work?
Extending this question: How an object knows what type it is? Is it the compiler or the runtime that knows that? How this is related to the CLR? As C# is statically and strongly typed, maybe the method call GetType() does not really exist, and the compiler can replace it with the appropriate result already at compile time?
We can quickly answer to the last question, empirically:
1 |
object o = new Random().Next(2) == 0 ? new BaseClass() : new DerivedClass(); |
What is the o.GetType() result? We do not know this either during compilation or during JITiting. Only in the time of execution – so it looks that the answer has to be done at runtime. Hence, we should search deeper.
If we look at the .NET Framework Reference Source method Object.GetType(), it quickly turns out that there is nothing really interesting:
1 2 3 |
// Returns a Type object which represent this object instance. [MethodImplAttribute(MethodImplOptions.InternalCall)] public extern Type GetType(); |
Note that this method is not marked as virtual, but behaves like a virtual – for each object returns the actual type. This is due to the special, internal implementation. Attribute value of InternalCall means that the method is implemented internally in the CLR. Thanks to CoreCLR we can look deeper. If we want to find an internal function implementation of InternalCall, we look at the CoreCLR’s source file .\Src\vm\ecalllist.h where there is an adequate mapping. In our case it is:
1 2 3 4 |
FCFuncStart(gObjectFuncs) FCIntrinsic("GetType", ObjectNative::GetClass, CORINFO_INTRINSIC_Object_GetType) FCFuncElement("MemberwiseClone", ObjectNative::Clone) FCFuncEnd() |
And thus we come to an implementation (here and further I omit not relevant code):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
// This routine is called by the Object.GetType() routine. It is a major way to get the Sytem.Type FCIMPL1(Object*, ObjectNative::GetClass, Object* pThis) { // ... OBJECTREF objRef = ObjectToOBJECTREF(pThis); if (objRef != NULL) { MethodTable* pMT = objRef->GetMethodTable(); OBJECTREF typePtr = pMT->GetManagedClassObjectIfExists(); if (typePtr != NULL) { return OBJECTREFToObject(typePtr); } } else FCThrow(kNullReferenceException); FC_INNER_RETURN(Object*, GetClassHelper(objRef)); } FCIMPLEND |
In short, what we see here is getting so-called object’s MethodTable (Object::GetMethodTable
) and returning the corresponding Type object (MethodTable::GetManagedClassObjectIfExists
) or create it if one does not exist (GetClassHelper
)1). Here we should stop for a moment and for clarity divide our discussion into separate steps.
MethodTable
An inseparable element of the CLR’s type system is called MethodTable which is a data structure describing a given type. These structures are kept in a separate memory space of the process. They describe, among others, what methods the type includes and what interfaces it implements. I believe it is not a good place to describe in detail this structure here. Just assume simply that MethodTable is some kind of the internal description of the type2). Looking at the class representing objects in memory (those found on the heap), we find that the first element of them is a pointer to the MethodTable:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
// code:Object is the respesentation of an managed object on the GC heap. // // See code:#ObjectModel for some important subclasses of code:Object // // The only fields mandated by all objects are // // * a pointer to the code:MethodTable at offset 0 // * a poiner to a code:ObjHeader at a negative offset. This is often zero. It holds information that // any addition information that we might need to attach to arbitrary objects. // class Object { protected: PTR_MethodTable m_pMethTab; // ... |
Previously seen GetMethodTable() method simply returns that pointer:
1 2 3 4 5 6 |
PTR_MethodTable Object::GetMethodTable() const { // ... return m_pMethTab; // ... } |
Each object on the heap has strict memory layout – sizes vary depending on whether we are talking about 32 or 64 bit:
As wee see, in addition to the data itself, we have an additional space for mentioned MethodTable pointer. Its address is a place pointed by any references to the object from other objects. Prior to this address is the header object. Often it may consist of nothing but zeros. But it can also be used, among others possiblities, for the synchronization mechanisms. Anyway, in our investigation we already know that the first step is simply to get the address located at the beginning of the object – indicating MethodTable structure.
Type
To obtain a Type object representing the MethodTable, OBJECTREF MethodTable::GetManagedClassObjectIfExists
method is being executed which looks inside internal structures to check whether the Type object has already been created. If the object does not exist, it is created by indirect call to the MethodTable::GetManagedClassObject
method. Anyway, as a result we get a pointer to the object, which in managed code becomes a reference to the appropriate Type object.
And that’s it! In fact, we see that the GetType() has not particularly complicated implementation. It is just to delve into certain CLR’s structures, which are assigned to each object on the heap. But what about objects on the stack? Keep reading!
Object on the stack
At the stack are present so-called value type objects – we know them as structs. It should be emphasized that this is really their implementation detail, not a characteristic. Nevertheless, looking at how they are implemented, they have much simpler memory representation – consisting only of their values:
In other words, CLR does not store anywhere in the value type object itself information about its type. This does not mean that MethodTable for value types does not exist. It is just not needed to “stick” it to the object, because the compiler is able to infer here much more. Because of the lack of inheritance, the exact type is actually already known at compile time. So now the question arises, how GetType() works in such case. Let’s look at the following example:
1 2 3 4 5 6 |
struct MyStruct { int x; int y; } void Main() { MyStruct s = new MyStruct(); var t = s.GetType(); } |
How did “something” know what s is? Perhaps here the compiler/JIT replace a call to the appropriate result? Althoug maybe theoretically it could happen, let’s look at what CIL will be generated from this example:
1 2 3 4 5 6 7 8 |
IL_0001: ldloca.s 00 // s IL_0003: initobj UserQuery.MyStruct IL_0009: ldloc.0 // s IL_000A: box UserQuery.MyStruct IL_000F: call System.Object.GetType IL_0014: stloc.1 // t |
The answer is easy to spot. Prior to calling GetType() method, the boxing of the value type occurs (while the exact type is known to the compiler). Boxing operation allocates a new object on the heap, which layout is known to us already. In particular, it contains a proper MethodTable pointer.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
HCIMPL2(Object*, JIT_Box, CORINFO_CLASS_HANDLE type, void* unboxedData) { TypeHandle clsHnd(type); MethodTable *pMT = clsHnd.AsMethodTable(); // .. newobj = pMT->FastBox(&unboxedData); return(OBJECTREFToObject(newobj)); } HCIMPLEND OBJECTREF MethodTable::FastBox(void** data) { // .. OBJECTREF ref = Allocate(); CopyValueClass(ref->UnBox(), *data, this, ref->GetAppDomain()); return ref; } |
Hence GetType() is processed as usual. Since boxed object has a typical layout, we can use the standard Object.GetType() method which get object’s MethodTable and returns the corresponding Type object.
Summary
And that’s all regaring how does GetType() work. If you have any questions, do not hesitate to ask!
PS. While answering this question a few more araises, which I will try to answer in the future:
- Why CLR does not support inheritance of structures?
- How dynamic type is supported and how it relates to the rules described here?
- What if your type declares
public new Type GetType () {return typeof (string); }
- How MethodTable and related structures looks? When they are created and in what area of memory?
- Since the compiler knows what is exact type of the value type, why it does not replace the GetType() call directly to the proper result?
- How does memory layout of
class C {}
andstruct S {}
(meaning – empty) look? How much memory do they use?
Footnotes:
1) Visible condition on the ObjectToOBJECTREF result can be ignored because in Release mode OBJECTREF is simply an alias for Object* and such macro has the form:
1 |
#define ObjectToOBJECTREF(obj) ((PTR_Object) (obj)) |
In the Debug mode OBJECTREF (.\src\vm\vars.hpp) is a class wrapping Object* with some additional diagnostic (“we use operator overloading to detect common programming mistakes that create GC holes“).
2) In fact, things are of course a bit more complex and MethodTable is just only an entry point into the more complex structures containing information about the type.
Quite an interesting post. Thanks for pointing out core CLR as well.
Wonderful write-up! Back in mid 2000s I had to poke around Rotor (SSCLI) source code for quite a while before I got to this level of details 🙂
Too bad you didn’t get to write about those 6 topics you stirred our appetite for 🙂
Fair point! I completely forgot about those points…
Yeah, now we have the whole runtime on Github 😀
How does GetType work in the Nullable type?
Nullable is a struct, it overrides all methods to substitute it with this.value (ToString=>this.value.ToString()).
It doesn’t override GetType and inherits its implementation from the object.
But bool? b; b.GetType() returns bool. Where is it substituded?