diff --git a/appendix.rst b/appendix.rst index f2db2ac2e..793f615c8 100644 --- a/appendix.rst +++ b/appendix.rst @@ -51,6 +51,7 @@ Language development in depth * :doc:`exploring` * :doc:`grammar` * :doc:`compiler` +* :doc:`garbage_collector` * :doc:`stdlibchanges` * :doc:`langchanges` * :doc:`porting` @@ -64,4 +65,4 @@ Testing and continuous integration * :doc:`buildbots` * :doc:`buildworker` * :doc:`coverity` - \ No newline at end of file + diff --git a/garbage_collector.rst b/garbage_collector.rst new file mode 100644 index 000000000..98ceff178 --- /dev/null +++ b/garbage_collector.rst @@ -0,0 +1,487 @@ +.. _gc: + +Design of CPython's Garbage Collector +===================================== + +:Author: Pablo Galindo Salgado + +.. highlight:: none + +Abstract +-------- + +The main garbage collector system of CPython is reference count. The basic idea is +that CPython counts how many different places there are that have a reference to an +object. Such a place could be another object, or a global (or static) C variable, or +a local variable in some C function. When an object’s reference count becomes zero, +the object is deallocated. If it contains references to other objects, their +reference count is decremented. Those other objects may be deallocated in turn, if +this decrement makes their reference count become zero, and so on. The reference +count field can be examined using the ``sys.getrefcount`` function (notice that the +value returned by this function is always 1 more as the function also has a reference +to the object when called): + +.. code-block:: python + + >>> x = object() + >>> sys.getrefcount(x) + 2 + >>> y = x + >>> sys.getrefcount(x) + 3 + del y + >>> sys.getrefcount(x) + 2 + +The main problem with the reference count schema is that reference counting +does not handle reference cycles. For instance, consider this code: + +.. code-block:: python + + >>> container = [] + >>> container.append(container) + >>> sys.getrefcount(container) + 3 + >>> del container + +In this example, ``container`` holds a reference to itself, so even when we remove +our reference to it (the variable "container") the reference count never falls to 0 +because it still has its own internal reference and therefore it will never be +cleaned just by simple reference counting. For this reason some additional machinery +is needed to clean these reference cycles between objects once they become +unreachable. This is the cyclic garbage collector, usually called just Garbage +Collector (GC), even though reference counting is also a form of garbage collection. + +Memory layout and object structure +---------------------------------- + +Normally the C structure supporting a regular Python object looks as follows: + +.. code-block:: none + + object -----> +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \ + | ob_refcnt | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyObject_HEAD + | *ob_type | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / + | ... | + + +In order to support the garbage collector, the memory layout of objects is altered +to accommodate extra information **before** the normal layout: + +.. code-block:: none + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \ + | *_gc_next | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyGC_Head + | *_gc_prev | | + object -----> +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / + | ob_refcnt | \ + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyObject_HEAD + | *ob_type | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / + | ... | + + +In this way the object can be treated as a normal python object and when the extra +information associated to the GC is needed the previous fields can be accessed by a +simple type cast from the original object: :code:`((PyGC_Head *)(the_object)-1)`. + +As is explained later in the `Optimization: reusing fields to save memory`_ section, +these two extra fields are normally used to keep doubly linked lists of all the +objects tracked by the garbage collector (these lists are the GC generations, more on +that in the `Optimization: generations`_ section), but they are also +reused to fullfill other pourposes when the full doubly linked list structure is not +needed as a memory optimization. + +Doubly linked lists are used because they efficiently support most frequently required operations. In +general, the collection of all objects tracked by GC are partitioned into disjoint sets, each in its own +doubly linked list. Between collections, objects are partitioned into "generations", reflecting how +often they've survived collection attempts. During collections, the generations(s) being collected +are further partitioned into, e.g., sets of reachable and unreachable objects. Doubly linked lists +support moving an object from one partition to another, adding a new object, removing an object +entirely (objects tracked by GC are most often reclaimed by the refcounting system when GC +isn't running at all!), and merging partitions, all with a small constant number of pointer updates. +With care, they also support iterating over a partition while objects are being added to - and +removed from - it, which is frequently required while GC is running. + +Specific APIs are offered to allocate, deallocate, initialize, track, and untrack +objects with GC support. These APIs can be found in the `Garbage Collector C API +documentation `_. + +Apart from this object structure, the type object for objects supporting garbage +collection must include the ``Py_TPFLAGS_HAVE_GC`` in its ``tp_flags`` slot and +provide an implementation of the ``tp_traverse`` handler. Unless it can be proven +that the objects cannot form reference cycles with only objects of its type or unless +the type is immutable, a ``tp_clear`` implementation must also be provided. + + +Identifiying reference cycles +---------------------------------------------- + +The algorithm that CPython uses to detect those reference cycles is +implemented in the ``gc`` module. The garbage collector **only focuses** +on cleaning container objects (i.e. objects that can contain a reference +to one or more objects). These can be arrays, dictionaries, lists, custom +class instances, classes in extension modules, etc. One could think that +cycles are uncommon but the truth is that many internal references needed by +the interpreter create cycles everywhere. Some notable examples: + + * Exceptions contain traceback objects that contain a list of frames that + contain the exception itself. + * Module-level functions reference the module's dict (which is needed to resolve globals), + which in turn contains entries for the module-level functions. + * Instances have references to their class which itself references its module, and the module + contains references to everything that is inside (and maybe other modules) + and this can lead back to the original instance. + * When representing data structures like graphs, it is very typical for them to + have internal links to themselves. + +To correctly dispose of these objects once they become unreachable, they need to be +identified first. Inside the function that identifies cycles, two double-linked +lists are maintained: one list contains all objects to be scanned, and the other will +contain all objects "tentatively" unreachable. + +To understand how the algorithm works, Let’s take the case of a circular linked list +which has one link referenced by a variable A, and one self-referencing object which +is completely unreachable + +.. code-block:: python + + >>> import gc + + >>> class Link: + ... def __init__(self, next_link=None): + ... self.next_link = next_link + + >>> link_3 = Link() + >>> link_2 = Link(link3) + >>> link_1 = Link(link2) + >>> link_3.next_link = link_1 + + >>> link_4 = Link() + >>> link_4.next_link = link_4 + + >>> del link_4 + >>> gc.collect() + 2 + +When the GC starts, it has all the container objects it wants to scan +on the first linked list. The objective is to move all the unreachable +objects. Since most objects turn out to be reachable, it is much more +efficient to move the unreachable as this involves fewer pointer updates. + +Every object that supports garbage collection will have an extra reference +count field initialized to the reference count (``gc_ref`` in the figures) +of that object when the algorithm starts. This is because the algorithm needs +to modify the reference count to do the computations and in this way the +interpreter will not modify the real reference count field. + +.. figure:: images/python-cyclic-gc-1-new-page.png + +The GC then iterates over all containers in the first list and decrements by one the +``gc_ref`` field of any other object that container is referencing. Doing +this makes use of the ``tp_traverse`` slot in the container class (implemented +using the C API or inherited by a superclass) to know what objects are referenced by +each container. After all the objects have been scanned, only the objects that have +references from outside the “objects to scan” list will have ``gc_refs > 0``. + +.. figure:: images/python-cyclic-gc-2-new-page.png + +Notice that having ``gc_refs == 0`` does not imply that the object is unreachable. +This is because another object that is reachable from the outside (``gc_refs > 0``) +can still have references to it. For instance, the ``link_2`` object in our example +ended having ``gc_refs == 0`` but is referenced still by the ``link_1`` object that +is reachable from the outside. To obtain the set of objects that are really +unreachable, the garbage collector scans again the container objects using the +``tp_traverse`` slot with a different traverse function that marks objects with +``gc_refs == 0`` as "tentatively unreachable" and then moves them to the +tentatively unreachable list. The following image depicts the state of the lists in a +moment when the GC processed the ``link 3`` and ``link 4`` objects but has not +processed ``link 1`` and ``link 2`` yet. + +.. figure:: images/python-cyclic-gc-3-new-page.png + +Then the GC scans the next ``link 1`` object. Because its has ``gc_refs == 1`` +the gc does not do anything special because it knows it has to be reachable (and is +already in what will become the reachable list): + +.. figure:: images/python-cyclic-gc-4-new-page.png + +When the GC encounters an object which is reachable (``gc_refs > 0``), it traverses +its references using the ``tp_traverse`` slot to find all the objects that are +reachable from it, moving them to the end of the list of reachable objects (where +they started originally) and setting its ``gc_refs`` field to 1. This is what happens +to ``link 2`` and ``link 3`` below as they are reachable from ``link 1``. From the +state in the previous image and after examining the objects referred to by ``link1`` +the GC knows that ``link 3`` is reachable after all, so it is moved back to the +original list and its ``gc_refs`` field is set to one so if the GC visits it again, it +does know that is reachable. To avoid visiting a object twice, the GC marks all +objects that are already visited once (by unsetting the ``PREV_MASK_COLLECTING`` flag) +so if an object that has already been processed is referred by some other object, the +GC does not process it twice. + +.. figure:: images/python-cyclic-gc-5-new-page.png + +Notice that once a object that was marked as "tentatively unreachable" and later is +moved back to the reachable list, it will be visited again by the garbage collector +as now all the references that that objects has need to be processed as well. This +process is really a breadth first search over the object graph. Once all the objects +are scanned, the GC knows that all container objects in the tentatively unreachable +list are really unreachable and can thus be garbage collected. + +Pragmatically, it's important to note that no recursion is required by any of this, +and neither does it in any other way require additional memory proportional to the +number of objects, number of pointers, or the lengths of pointer chains. Apart from +``O(1)`` storage for internal C needs, the objects themselves contain all the storage +the GC algorithms require. + +Why moving unreachable objects is better +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It sounds logical to move the unreachable objects under the premise that most objects +are usually reachable, until you think about it: the reason it pays isn't actually +obvious. + +Suppose we create objects A, B, C in that order. They appear in the young generation +in the same order. If B points to A, and C to B, and C is reachable from outside, +then the adjusted refcounts after the first step of the algorithm runs will be 0, 0, +and 1 respectively because the only reachable object from the outside is C. + +When the next step of the algorithm finds A, A is moved to the unreachable list. The +same for B when it's first encountered. Then C is traversed, B is moved *back* to +the reachable list. B is eventually traversed, and then A is moved back to the reachable +list. + +So instead of not moving at all, the reachable objects B and A are each moved twice. +Why is this a win? A straightforward algorithm to move the reachable objects instead +would move A, B, and C once each. The key is that this dance leaves the objects in +order C, B, A - it's reversed from the original order. On all *subsequent* scans, +none of them will move. Since most objects aren't in cycles, this can save an +unbounded number of moves across an unbounded number of later collections. The only +time the cost can be higher is the first time the chain is scanned. + +Destroying unreachable objects +------------------------------ + +Once the GC knows the list of unreachable objects, a very delicate process starts +with the objective of completely destroying these objects. Roughly, the process +follows these steps in order: + +1. Handle and clean weak references (if any). If an object that is in the unreachable + set is going to be destroyed and has weak references with callbacks, these + callbacks need to be honored. This process is **very** delicate as any error can + cause objects that will be in an inconsistent state to be resurrected or reached + by some python functions invoked from the callbacks. To avoid these weak references + that also are part of the unreachable set (the object and its weak reference + are in a cycles that are unreachable) then the weak reference needs to be cleaned + immediately and the callback must not be executed so it does not trigger later + when the ``tp_clear`` slot is called, causing havoc. This is fine because both + the object and the weakref are going away, so it's legitimate to pretend the + weak reference is going away first so the callback is never executed. + +2. If an object has legacy finalizers (``tp_del`` slot) move them to the + ``gc.garbage`` list. +3. Call the finalizers (``tp_finalize`` slot) and mark the objects as already + finalized to avoid calling them twice if they resurrect of if other finalizers + have removed the object first. +4. Deal with resurrected objects. If some objects have been resurrected the GC + finds the new subset of objects that are still unreachable by running the cycle + detection algorithm again and continues with them. +5. Call the ``tp_clear`` slot of every object so all internal links are broken and + the reference counts fall to 0, triggering the destruction of all unreachable + objects. + +Optimization: generations +------------------------- + +In order to limit the time each garbage collection takes, the GC is uses a popular +optimization: generations. The main idea behind this concept is the assumption that +most objects have a very short lifespan and can thus be collected shortly after their +creation. This has proven to be very close to the reality of many Python programs as +many temporarily objects are created and destroyed very fast. The older an object is +the less likely it is to become unreachable. + +To take advantage of this fact, all container objects are segregated across +three spaces/generations. Every new +object starts in the first generation (generation 0). The previous algorithm is +executed only over the objects of a particular generation and if an object +survives a collection of its generation it will be moved to the next one +(generation 1), where it will be surveyed for collection less often. If +the same object survives another GC round in this new generation (generation 1) +it will be moved to the last generation (generation 2) where it will be +surveyed the least often. + +Generations are collected when the number of objects that they contain reach some +predefined threshold which is unique for each generation and is lower than the older +generations are. These thresholds can be examined using the ``gc.get_threshold`` +function: + +.. code-block:: python + + >>> import gc + >>> gc.get_threshold() + (700, 10, 10) + + +The content of these generations can be examined using the +``gc.get_objects(generation=NUM)`` function and collections can be triggered +specifically in a generation by calling ``gc.collect(generation=NUM)``. + +.. code-block:: python + + >>> import gc + >>> class MyObj: + ... pass + ... + + # Move everything to the last generation so its easier to inspect + # the younger generations. + + >>> gc.collect() + 0 + + # Create a reference cycle + + >>> x = MyObj() + >>> x.self = x + + # Initially the object is in the younguest generation. + + >>> gc.get_objects(generation=0) + [..., <__main__.MyObj object at 0x7fbcc12a3400>, ...] + + # After a collection of the younguest generation the object + # moves to the next generation. + + >>> gc.collect(generation=0) + 0 + >>> gc.get_objects(generation=0) + [] + >>> gc.get_objects(generation=1) + [..., <__main__.MyObj object at 0x7fbcc12a3400>, ...] + + + +Collecting the oldest generation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In addition to the various configurable thresholds, the GC only triggers a full +collection of the oldest generation if the ratio ``long_lived_pending / long_lived_total`` +is above a given value (hardwired to 25%). The reason is that, while "non-full" +collections (i.e., collections of the young and middle generations) will always +examine roughly the same number of objects (determined by the aforementioned +thresholds) the cost of a full collection is proportional to the total +number of long-lived objects, which is virtually unbounded. Indeed, it has +been remarked that doing a full collection every of object +creations entails a dramatic performance degradation in workloads which consist +of creating and storing lots of long-lived objects (e.g. building a large list +of GC-tracked objects would show quadratic performance, instead of linear as +expected). Using the above ratio, instead, yields amortized linear performance +in the total number of objects (the effect of which can be summarized thusly: +"each full garbage collection is more and more costly as the number of objects +grows, but we do fewer and fewer of them"). + +Optimization: reusing fields to save memory +------------------------------------------- + +In order to save memory, the two linked list pointers in every object with GC +support are reused for several purposes. This is a common optimization known +as "fat pointers" or "tagged pointers": pointers that carry additional data, +"folded" into the pointer, meaning stored inline in the data representing the +address, taking advantage of certain properties of memory addressing. This is +possible as most architectures align certain types of data +to the size of the data, often a word or multiple thereof. This discrepancy +leaves a few of the least significant bits of the pointer unused, which can be +used for tags or to keep other information – most often as a bit field (each +bit a separate tag) – as long as code that uses the pointer masks out these +bits before accessing memory. E.g., on a 32-bit architecture (for both +addresses and word size), a word is 32 bits = 4 bytes, so word-aligned +addresses are always a multiple of 4, hence end in ``00``, leaving the last 2 bits +available; while on a 64-bit architecture, a word is 64 bits word = 8 bytes, so +word-aligned addresses end in ``000``, leaving the last 3 bits available. + +The CPython GC makes use of two fat pointers that corresponds to the extra fields +of ``PyGC_Head`` discussed in the `Memory layout and object structure`_ section: + + .. warning:: + + Because the presence of extra information, "tagged" or "fat" pointers cannot be + dereferenced directly and the extra information must be stripped off before + obtaining the real memory address. Special care needs to be taken with + functions that directly manipulate the linked lists, as these functions + normally asume the pointers inside the lists are in a consistent state. + + +* The ``_gc_prev``` field is normally used as the "previous" pointer to maintain the + doubly linked list but its lowest two bits are used to keep the flags + ``PREV_MASK_COLLECTING`` and ``_PyGC_PREV_MASK_FINALIZED``. Between collections, + the only flag that can be present is ``_PyGC_PREV_MASK_FINALIZED`` that indicates + if an object has been already finalized. During collections ``_gc_prev`` is + temporarily used for storing a copy of the reference count (``gc_refs``), in + addition to two flags, and the GC linked list becomes a singly linked list until + ``_gc_prev`` is restored. + +* The ``_gc_next`` field is used as the "next" pointer to maintain the doubly linked + list but during collection its lowest bit is used to keep the + ``NEXT_MASK_UNREACHABLE`` flag that indicates if an object is tentatively + unreachable during the cycle detection algorithm. This is a drawback to using only + doubly linked lists to implement partitions: while most needed operations are + constant-time, there is no efficient way to determine which partition an object is + currently in. Instead, when that's needed, ad hoc tricks (like the + ``NEXT_MASK_UNREACHABLE`` flag) are employed. + +Optimization: delay tracking containers +--------------------------------------- + +Certain types of containers cannot participate in a reference cycle, and so do +not need to be tracked by the garbage collector. Untracking these objects +reduces the cost of garbage collections. However, determining which objects may +be untracked is not free, and the costs must be weighed against the benefits +for garbage collection. There are two possible strategies for when to untrack +a container: + +1. When the container is created. +2. When the container is examined by the garbage collector. + +As a general rule, instances of atomic types aren't tracked and instances of +non-atomic types (containers, user-defined objects...) are. However, some +type-specific optimizations can be present in order to suppress the garbage +collector footprint of simple instances. Some examples of native types that +benefit from delayed tracking: + +* Tuples containing only immutable objects (integers, strings etc, + and recursively, tuples of immutable objects) do not need to be tracked. The + interpreter creates a large number of tuples, many of which will not survive + until garbage collection. It is therefore not worthwhile to untrack eligible + tuples at creation time. Instead, all tuples except the empty tuple are tracked + when created. During garbage collection it is determined whether any surviving + tuples can be untracked. A tuple can be untracked if all of its contents are + already not tracked. Tuples are examined for untracking in all garbage collection + cycles. It may take more than one cycle to untrack a tuple. + +* Dictionaries containing only immutable objects also do not need to be tracked. + Dictionaries are untracked when created. If a tracked item is inserted into a + dictionary (either as a key or value), the dictionary becomes tracked. During a + full garbage collection (all generations), the collector will untrack any dictionaries + whose contents are not tracked. + +The garbage collector module provides the python function is_tracked(obj), which returns +the current tracking status of the object. Subsequent garbage collections may change the +tracking status of the object. + +.. code-block:: python + + >>> gc.is_tracked(0) + False + >>> gc.is_tracked("a") + False + >>> gc.is_tracked([]) + True + >>> gc.is_tracked({}) + False + >>> gc.is_tracked({"a": 1}) + False + >>> gc.is_tracked({"a": []}) + True diff --git a/images/python-cyclic-gc-1-new-page.png b/images/python-cyclic-gc-1-new-page.png new file mode 100644 index 000000000..2ddac50f4 Binary files /dev/null and b/images/python-cyclic-gc-1-new-page.png differ diff --git a/images/python-cyclic-gc-2-new-page.png b/images/python-cyclic-gc-2-new-page.png new file mode 100644 index 000000000..159aeeb05 Binary files /dev/null and b/images/python-cyclic-gc-2-new-page.png differ diff --git a/images/python-cyclic-gc-3-new-page.png b/images/python-cyclic-gc-3-new-page.png new file mode 100644 index 000000000..29fab0498 Binary files /dev/null and b/images/python-cyclic-gc-3-new-page.png differ diff --git a/images/python-cyclic-gc-4-new-page.png b/images/python-cyclic-gc-4-new-page.png new file mode 100644 index 000000000..51a2b1065 Binary files /dev/null and b/images/python-cyclic-gc-4-new-page.png differ diff --git a/images/python-cyclic-gc-5-new-page.png b/images/python-cyclic-gc-5-new-page.png new file mode 100644 index 000000000..fe67a6896 Binary files /dev/null and b/images/python-cyclic-gc-5-new-page.png differ diff --git a/index.rst b/index.rst index f5feda008..03dcbeb8b 100644 --- a/index.rst +++ b/index.rst @@ -260,6 +260,7 @@ Additional Resources * :doc:`exploring` * :doc:`grammar` * :doc:`compiler` + * :doc:`garbage_collector` * Tool support * :doc:`gdb` * :doc:`clang` @@ -317,6 +318,7 @@ Full Table of Contents exploring grammar compiler + garbage_collector extensions coverity clang pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy