I believe in simplifying
this as far as possible. If there are n ways of performing something,
then the best one is the simplest of all. So what is the definition of
something being simple? Well, the one which requires least number of
other things to be understood to understand it can be called the
simplest one. The one which has the least number of parameters could be
considered to be the simplest one.
So here I am trying to
present the story of garbage collection in .Net environment in the
simplest possible way (??).
Let us start with the
cause which lead to the effect of garbage collection. We know necessity
is the mother of all inventions. So what was the necessity to invent
garbage collection?
Unmanaged Vs Managed
Environment:
For those of you who
have wrote programs in unmanaged environment like
C/C++, you might remember the unpredictable bugs which used to creep in
inside your code, either because you forgot to
free some allocated chunk of memory or because you tried to
access some memory location which already had been
freed earlier.
For those of you who
have not wrote programs in unmanaged environment, the
unmanaged programming style was to programmatically deallocate memory
when the data residing in that memory location was no longer required in
the program. Failure to do so would cause memory leakage which meant
that memory was being wasted on data which was no longer required (this
is what we call garbage data!). Also there were issues when a programmer
had accidentally tried to access data after deallocating memory, there
by causing a runtime exception. Remember applications crashing saying ,
'The instruction at "0x7c9105f8" referenced memory
at "0x025f0010". The memory could not be "read."' ? Why the
memory could not be read. Most probably because it had been freed
already :-)
Enter the era of Java
which can be called C++ Minus Pointers in terms of features (In fact
there are many other differences too, like C++ allows multiple
inheritances while Java does not). Java did not allow the programmer to
access any and every memory location. In fact there are no pointers in
Java. All the programmer has access to were references
to the objects he created. The only way of accessing objects in managed
environment is via object references. So if you create an object which
takes 12 bytes of space in java, there is no way that you can try to
access the 13th byte, for the simple reason that there are no pointers
in Java where you can take the object reference and say *(p+13).
The major advantage of
removing pointers was that the runtime can do memory management. Since
an object could only be accessed by using its references the runtime
could always cleanup the memory used by an object once the number of
references to an object came down to zero. Because if an object has no
(zero) references then it just means that the object can never be
accessed by any of your code which means that the memory used by it is
eligible for garbage collection!
Garbage Collector:
In this article I will
be talking in depth about how garbage collection works in dotnet.
In dotnet the developers
never explicitly release memory. Instead, this job is done by Garbage
Collector. In the rest of this article I will call Garbage Collector by
its short form GC.
So what is GC? It is a
thread. What does it do? To summarize, GC frees up memory used by those
objects which can no longer be accessed by your runtime code.
For instance consider
the two funtions below:
void A()
{
B();
int i = 10;
}
{
B();
int i = 10;
}
void B()
{
C c = new C();
Console.WriteLine(c.Name);
}
{
C c = new C();
Console.WriteLine(c.Name);
}
In the above example
once the control comes to the line int i = 10;
in function A(), there is no way for the
system to access the object c created in function B(). This is an object
which is eligible for Garbage Collection. So the next time when the GC
thread runs it can cleanup the memory allotted to the object c. (NOTE:
Memory to the object c here is allotted when you create an instance of C
it using new keyword)
When does the GC clean
up the memory used by the object c? Well, it cant be guaranteed when.
The task of garbage collection is a costly process and GC runs only when
a call to create a new object fails due to lack of memory or when the
user explicitly calls GC thread to do garbage collection by calling the
function GC.Collect().
Considering the amount
of physical memory which systems today usually have (I suppose 512 MB
RAM is a common configuration today), stand alone applications (like
Console Applications or Windows Form Applications) may never really face
a situation where GC has to be called due to memory shortage!
But think of those
background processed like windows services which are supposed to be run
theoretically forever on Windows Server Machines. In such scenarios,
even a memory leakage of 10 bytes per hour adds up to a significant
amount over a period of time and the process may finally run OUT OF
MEMORY! But just sometime back I said in dotnet developers do not
do memory management, it is done by the run time engine using Garbage
Collector right? So how can there be a memory leakage?
Well, this was what even
I wondered when the first windows service which I wrote (in C#) bombed
in the Test Server after two days citing Out Of Memory Error and we had
to reboot the server altogether!! The server even refused to open a
small text file in Notepad till we rebooted it!!
The reason here is GC
can recollect memory from only those objects which are no longer
accessible in your code by any reference. As you know a given object can
have multiple references pointing to it. For GC to collect the memory
used by an object all these multiple references should go either out of
scope (like what happened to the object c in the above code sample once
the execution of function B() was over) or all references to that object
should be removed explicitly.
The former case happens
automatically and there is no need for the developer to worry about it,
because once a function goes out of scope all its local objects will be
eligible for garbage collection, unless and until of course the local
objects have their references passed outside the function.
In the case of the
windows service which I wrote, the latter was not happening, i.e. I had
objects in my windows service which were no longer required but still
had references pointing to them. So for them to be eligible to Garbage
Collection I had to remove all the references to these objects
programmatically.
See the code snippet
below:
class
A
{
private B b; //line 3
public void RequireB()
{
b = new B(); //line 6
b.DoSomeWork();
}
}
{
private B b; //line 3
public void RequireB()
{
b = new B(); //line 6
b.DoSomeWork();
}
}
In the above example
initially we have just created a reference to object B called b. But
this reference is not pointing to any object. In line 6 we create an
object of class B by calling new B() and set the reference b to this
object. So now b is pointing to object created by new B().
We require this object b
only inside the function RequireB(). But since the reference is at the
class level, the object pointed by b will not be eligible for garbage
collection even after the execution of function RequireB() is completed.
This is because that object is still accessible via its reference b! So
to make the object eligible for garbage collection after the function
RequireB() is executed we need to remove the reference to the object by
setting the reference to null as below.
class
A
{
private B b; //line 3
public void RequireB()
{
b = new B(); //line 6
b.DoSomeWork();
b = null;
}
}
{
private B b; //line 3
public void RequireB()
{
b = new B(); //line 6
b.DoSomeWork();
b = null;
}
}
See the same code
snippet below where I have created multiple references to the same
object. Garbage collector will not be able to collect the memory
of the object till all the references to it are removed.
class
A
{
private B b; //reference now points to nothing, trying to use this directly will give NullPointerException
private B b2; //another empty reference which points to nothing
public void RequireB()
{
b = new B(); //Set this reference to an object
b2 = b; //Also set b2 as reference to the same object pointed by b
// Now object created above by calling new B() has 2 references b and b2
b2.DoSomeWork(); //b2.DoSomeWork is same as b.DoSomeWork() as both point to same object
b = null; //Just doing this will not make the object eligible for garbage collection because even b2 is pointing to it. So you need the line below too! Else there will be a memory leakage
b2=null;
}
}
{
private B b; //reference now points to nothing, trying to use this directly will give NullPointerException
private B b2; //another empty reference which points to nothing
public void RequireB()
{
b = new B(); //Set this reference to an object
b2 = b; //Also set b2 as reference to the same object pointed by b
// Now object created above by calling new B() has 2 references b and b2
b2.DoSomeWork(); //b2.DoSomeWork is same as b.DoSomeWork() as both point to same object
b = null; //Just doing this will not make the object eligible for garbage collection because even b2 is pointing to it. So you need the line below too! Else there will be a memory leakage
b2=null;
}
}
NOTE that the above
example code snippet that the object created inside RequireB()
function is not required outside that
function, else you wont set the references to null.
Coming back to GC again,
we know that every attempt to create an instance of a class using new
keyword first tries to allocate the amount of bytes required by the
object being created. When the process has already created too many
objects (out of which most might be already eligible for garbage
collection by now) there might arise a situation where an attempt to
create a new object fails!!
Memory Allocation:
Look at the animation
below which tries to explain memory allocation in a dotnet managed heap.
Managed heap is that chunk of memory allocated to your process where all
managed objects created in your process are allotted memory. Managed
objects are those objects which are instances of classes written
entirely in dotnet languages (like C# or VB.Net). So can we have
unmanaged objects in dotnet? Yes, but they will be created in a separate
heap in the process called unmanaged heap and dotnet garbage collector
never even touches this heap. Any objects which we create in dotnet
using Win32 API or by referring COM+ component wrappers are unmanaged
ones as the code for these objects is not written using dotnet
specifications/dotnet languages. Simply put there is no metadata
available for these objects.
So initially when your
program starts the managed heap will be empty and there will be a
next object pointer pointing to the 0th
location on the managed heap. Next object pointer
is an internal reference used by the runtime to identify the location
where the next object has to be created.
See the animation below.
I strongly suggest
viewing the animation
above completely by clicking the next button and understanding the
simplified version of Garbage Collection process in dotnet. This will
help you to easily understand the things which I am going to tell now.
Assuming that you have
gone through the above Flashback movie, we will continue with the
remaining part of the story.
Behind the Scenes:
There is one obvious
difference between writing Object Oriented Programs in managed and
unmanaged environments. C++ programmers, do you remember destructors in
classes?
For those who don't know
what destructors are, they are functions which are called to do the last
rites of an object just before it is freed from memory. In C++, the
developer used to voluntarily free the memory by calling
free k, but in managed environment
like dotnet memory is freed by Garbage Collector and god knows when that
will happen!! So we cannot have destructors in dotnet!
This is because
destructors are usually used to clean up resources used by the object
like closing an open database connection held by the object being
cleaned, or closing an open file held by the object being cleaned etc.
Delaying this till the garbage collection happens means keeping all
these resources open even when they are not used. And who knows, if a
program doesn't face any memory crunch then GC may not be called
altogether, there by holding these resources forever!!
So in managed
environment there is no concept of destructors.
Now then what do we do
about resources which need to be cleaned when an object is no longer
required?
Well for this the design
pattern in dotnet is to implement IDisposable
interface on such classes. This interface has only one method called
Dispose() and we are supposed to do all
those cleanup activities which has to be done inside the destructor in
this method.
So all code which was
supposed to go into the destructor (if it were an unmanaged environment)
will now go into the Dispose() method in
dotnet managed environment.
Now who calls this
Dispose method and when? The catch is Dispose()
method has to be called by the developer once he feels that an object is
no longer required. Usually this is done just before an object goes out
of scope or just before the last reference to an object is set to
null.
Note that dotnet runtime
knows nothing about the Dispose() method.
As I mentioned earlier this is just a design pattern. Alternatively,
there is no hard and fast rule that you have to do cleanup only by
implementing IDisposable interface and
writing the cleanup code in its Dispose()
method. You can just add a method to your class called Cleanup() or say
Clear() or whatever and write the cleanup code in that method. But you
will have to ensure that you call this method always to perform the
cleanup action. But there are also possibilities that other developers
might be using your class and hence using
IDisposable is a common convention so that any developer when he
sees this interface being implemented by a class automatically
understands that he needs to call the Dispose()
method on objects of this class when they are no longer required.
Note that dotnet FCL i.e
class library has lot of classes (especially in ADO.Net say like
SqlConnection) which have implemented IDisposable
interface.
Another advantage for C#
developers of implementing IDisposable
interface instead of any other alternative is that there is a special
keyword in C# called using (not the one
used to import namespaces) which automatically calls the
Dispose() method on objects defined within
its scope.
See the code snippet
below.
B b =
new B();
using (b)
{
// use b
} // here compiler will call Dispose on b automatically
using (b)
{
// use b
} // here compiler will call Dispose on b automatically
Note that for an object
reference to be specified alongside the using keyword, its class (in
this case class B) MUST have implemented the
IDisposable interface so that the Dispose()
method could be called once the scope of the using keyword ends. Else
you will get a compile time error.
Now that I have spoken
about C#, C# developers might be wondering then what about the functions
which they can write in C# classes whose syntax is similar to those of
C++ destructors. I already said there are no destructors in dotnet. Then
what are these?
Well, dotnet framework
also provides a function called Finalize() for all classes which when
populated with some code will be called by the Garbage Collector just
before the object is garbage collected!
Now why is this function
required? Well, as I said earlier Dispose() must be programmatically
called by developers (in C# one can of course use the using keyword, but
this cannot be used when the scope of an object is say at the class
level.) to cleanup resources if the class being implemented requires any
such cleanup. What if the developer who is creating objects of such a
class by chance forgets to call Dispose()
method on such an object? (Even though this mistake is unpardonable :-)
Well, so in that case
the developer of the class implementing the
IDisposable interface can call the Dispose() method in the
Finalize() method so that the cleanup is at least done when the Garbage
Collector is invoked by the system. As I said earlier GC calls Finalize
methods on objects which have some code implemented in these methods.
(NOTE again there is no guarantee that GC calls Finalize() methods on
all objects which implement the method! Pretty confusing? I'll come to
this later).
Coming back to C#,
functions with destructor syntax are nothing but Finalize() methods, of
course with a call to base class Finalize() method.
In the C# example below:
class
K
{
K() {}
~K() {}
}
C# Compiler replaces above C# code as follows:
class K
{
K() {}
void Finalize()
{
base.Finalize();
}
}
{
K() {}
~K() {}
}
C# Compiler replaces above C# code as follows:
class K
{
K() {}
void Finalize()
{
base.Finalize();
}
}
Now imagine a situation
where you are calling the Dispose() method in your Finalize method to
ensure that the Dispose() gets called at least during Garbage Collection
even if the developer using your class forgets to calls it. What if the
developer also calls your Dispose() method? Then when the GC comes and
calls the Dispose() method by calling Finalize() wont there be a problem
because the Dispose() has already been called?
There are two things to
be noted here.
#1: You should implement
your Dispose() method in such a way that it throws no exceptions no
matter how many times it is called. So that in case the developer
implementing your class accidentally happens to call the Dispose()
method multiple times, it does not throw an exception the second time
onwards. Also note that the dotnet runtime will ignore any
error/exception thrown by Finalize() method and will just assume that
the Finalize() method completed normally!
#2: Making the runtime
call Finalize() method is a costly operation because usually GC occurs
when there is a memory crunch and when GC is taking place all other
threads are suspended till GC completes its activity from executing and
objects which implement Finalize() methods will have to undergo an extra
GC lifecycle before their memory is recovered compared to other objects
which do not implement Finalize() method! (Will come to this later)
So whatz the solution to
our problem? Well, Now that we are calling Dispose() from Finalize()
method too, there is way to prevent Finalize() being called if Dispose()
has been called by the developer. All you have to do is call
GC.SuppressFinalize() method at the end of your Dispose() method. This
instructs the dotnet runtime not to call Finalize() method on this
object even if its class has implemented Finalize() method. Here's a
sample code snippet to achieve the same.
class
B : IDisposable
{
public void Dispose()
{
//Do all cleanup activities
GC.SuppressFinalize();
}
~B()
{
Dispose();
}
}
{
public void Dispose()
{
//Do all cleanup activities
GC.SuppressFinalize();
}
~B()
{
Dispose();
}
}
In the above code
snippet, we have ensured two things:
#1: In case the
developer using our class forgets to call the Dispose() method on its
object, at least dotnet runtime will call the Dispose() method when it
calls the Finalize() method (Remember C# destructor is nothing but the
Finalize() method in disguise!)
#2: In case the developer using our class correctly calls Dispose() method on its objects then, we have ensured that the dotnet runtime does not call Finalize() method on such objects.
#2: In case the developer using our class correctly calls Dispose() method on its objects then, we have ensured that the dotnet runtime does not call Finalize() method on such objects.
No comments:
Post a Comment