Garbage Collectionâ "Part 2: Automatic Memory Management: Weak References
Garbage Collectionâ "Part 2: Automatic Memory Management: Weak References
Weak References
When a root points to an object, the object cannot be collected because the application's code
can reach the object. When a root points to an object, it's called a strong reference to the object.
However, the garbage collector also supports weak references. Weak references allow the garbage
collector to collect the object, but they also allow the application to access the object. How can this
be? It all comes down to timing.
If only weak references to an object exist and the garbage collector runs, the object is collected
and when the application later attempts to access the object, the access will fail. On the other
hand, to access a weakly referenced object, the application must obtain a strong reference to the
object. If the application obtains this strong reference before the garbage collector collects the
object, then the garbage collector can't collect the object because a strong reference to the object
exists. I know this all sounds somewhat confusing, so let's clear it up by examining the code in
Figure 1.
Why might you use weak references? Well, there are some data structures that are created
easily, but require a lot of memory. For example, you might have an application that needs to know
all the directories and files on the user's hard drive. You can easily build a tree that reflects this
information and as your application runs, you'll refer to the tree in memory instead of actually
accessing the user's hard disk. This procedure greatly improves the performance of your
application.
The problem is that the tree could be extremely large, requiring quite a bit of memory. If the
user starts accessing a different part of your application, the tree may no longer be necessary and
is wasting valuable memory. You could delete the tree, but if the user switches back to the first
part of your application, you'll need to reconstruct the tree again. Weak references allow you to
handle this scenario quite easily and efficiently.
When the user switches away from the first part of the application, you can create a weak
reference to the tree and destroy all strong references. If the memory load is low for the other part
of the application, then the garbage collector will not reclaim the tree's objects. When the user
switches back to the first part of the application, the application attempts to obtain a strong
reference for the tree. If successful, the application doesn't have to traverse the user's hard drive
again.
The WeakReference type offers two constructors:
WeakReference(Object target);
WeakReference(Object target, Boolean trackResurrection);
The target parameter identifies the object that the WeakReference object should track. The
trackResurrection parameter indicates whether the WeakReference object should track the object
after it has had its Finalize method called. Usually, false is passed for the trackResurrection
parameter and the first constructor creates a WeakReference that does not track resurrection. (For
an explanation of resurrection, see part 1 of this article at
http://msdn.microsoft.com/msdnmag/issues/1100/GCI/GCI.asp.)
For convenience, a weak reference that does not track resurrection is called a short weak
reference, while a weak reference that does track resurrection is called a long weak reference. If an
object's type doesn't offer a Finalize method, then short and long weak references behave
identically. It is strongly recommended that you avoid using long weak references. Long weak
references allow you to resurrect an object after it has been finalized and the state of the object is
unpredictable.
Once you've created a weak reference to an object, you usually set the strong reference to the
object to null. If any strong reference remains, the garbage collector will be unable to collect the
object.
To use the object again, you must turn the weak reference into a strong reference. You
accomplish this simply by calling the WeakReference object's Target property and assigning the
result to one of your application's roots. If the Target property returns null, then the object was
collected. If the property does not return null, then the root is a strong reference to the object and
the code may manipulate the object. As long as the strong reference exists, the object cannot be
collected.
Generations
When I first started working in a garbage-collected environment, I had many concerns about
performance. After all, I've been a C/C++ programmer for more than 15 years and I understand
the overhead of allocating and freeing memory blocks from a heap. Sure, each version of
Windows® and each version of the C runtime has tweaked the internals of the heap algorithms in
order to improve performance.
Well, like the developers of Windows and the C runtime, the GC developers are tweaking the
garbage collector to improve its performance. One feature of the garbage collector that exists
purely to improve performance is called generations. A generational garbage collector (also known
as an ephemeral garbage collector) makes the following assumptions:
• The newer an object is, the shorter its lifetime will be.
• The older an object is, the longer its lifetime will be.
• Newer objects tend to have strong relationships to each other and are frequently accessed
around the same time.
• Compacting a portion of the heap is faster than compacting the whole heap.
Of course, many studies have demonstrated that these assumptions are valid for a very large
set of existing applications. So, let's discuss how these assumptions have influenced the
implementation of the garbage collector.
When initialized, the managed heap contains no objects. Objects added to the heap are said to
be in generation 0, as you can see in Figure 2. Stated simply, objects in generation 0 are young
objects that have never been examined by the garbage collector.
Figure 2 Generation 0
Now, if more objects are added to the heap, the heap fills and a garbage collection must occur.
When the garbage collector analyzes the heap, it builds the graph of garbage (shown here in
purple) and non-garbage objects. Any objects that survive the collection are compacted into the
left-most portion of the heap. These objects have survived a collection, are older, and are now
considered to be in generation 1 (see Figure 3).
As even more objects are added to the heap, these new, young objects are placed in
generation 0. If generation 0 fills again, a GC is performed. This time, all objects in generation 1
that survive are compacted and considered to be in generation 2 (see Figure 4). All survivors in
generation 0 are now compacted and considered to be in generation 1. Generation 0 currently
contains no objects, but all new objects will go into generation 0.
Currently, generation 2 is the highest generation supported by the runtime's garbage collector.
When future collections occur, any surviving objects currently in generation 2 simply stay in
generation 2.
Most heaps (like the C runtime heap) allocate objects wherever they find free space. Therefore,
if I create several objects consecutively, it is quite possible that these objects will be separated by
megabytes of address space. However, in the managed heap, allocating several objects
consecutively ensures that the objects are contiguous in memory.
One of the assumptions stated earlier was that newer objects tend to have strong relationships
to each other and are frequently accessed around the same time. Since new objects are allocated
contiguously in memory, you gain performance from locality of reference. More specifically, it is
highly likely that all the objects can reside in the CPU's cache. Your application will access these
objects with phenomenal speed since the CPU will be able to perform most of its manipulations
without having cache misses which forces RAM access.
Microsoft's performance tests show that managed heap allocations are faster than standard
allocations performed by the Win32 HeapAlloc function. These tests also show that it takes less
than 1 millisecond on a 200Mhz Pentium to perform a full GC of generation 0. It is Microsoft's goal
to make GCs take no more time than an ordinary page fault.
To monitor the runtime's garbage collector, select the COM+ Memory Performance object.
Then, you can select a specific application from the instance list box. Finally, select the set of
counters that you're interested in monitoring and press the Add button followed by the Close
button. At this point, the System Monitor will graph the selected real-time statistics. Figure 7
describes the function of each counter.
Conclusion
So that's just about the full story on garbage collection. Last month I provided the background
on how resources are allocated, how automatic garbage collection works, how to use the
finalization feature to allow an object to clean up after itself, and how the resurrection feature can
restore access to objects. This month I explained how weak and strong references to objects are
implemented, how classifying objects in generations results in performance benefits, and how you
can manually control garbage collection with System.GC. I also covered the mechanisms the
garbage collector uses in multithreaded applications to improve performance, what happens with
objects that are larger than 20,000 bytes, and finally, how you can use the Windows 2000 System
Monitor to track garbage collection performance. With this information in hand, you should be able
to simplify memory management and boost performance in your applications.
For related articles see:
http://msdn.microsoft.com/msdnmag/issues/1100/GCI/GCI.asp
For background information see:
Garbage Collection: Algorithms for Automatic Dynamic Memory Management by Richard Jones and
Rafael Lins (John Wiley & Son, 1996)
Programming Applications for Microsoft Windows by Jeffrey Richter (Microsoft Press, 1999)
Jeffrey Richter (http://www.JeffreyRichter.com) is the author of Programming Applications for
Microsoft Windows (Microsoft Press, 1999), and is a co-founder of Wintellect
(http://www.Wintellect.com), a software education, debugging, and consulting firm.