[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Emacs Lisp, like all Lisps, has garbage collection. This means that the programmer never has to explicitly free (destroy) an object; it happens automatically when the object becomes inaccessible. Most experts agree that garbage collection is a necessity in a modern, high-level language. Its omission from C stems from the fact that C was originally designed to be a nice abstract layer on top of assembly language, for writing kernels and basic system utilities rather than large applications.
Lisp objects can be created by any of a number of Lisp primitives.
Most object types have one or a small number of basic primitives
for creating objects. For conses, the basic primitive is cons
;
for vectors, the primitives are make-vector
and vector
; for
symbols, the primitives are make-symbol
and intern
; etc.
Some Lisp objects, especially those that are primarily used internally,
have no corresponding Lisp primitives. Every Lisp object, though,
has at least one C primitive for creating it.
Recall from section (VII) that a Lisp object, as stored in a 32-bit or 64-bit word, has a few tag bits, and a “value” that occupies the remainder of the bits. We can separate the different Lisp object types into three broad categories:
GCPRO
ed.
In the remaining two categories, the type is stored in the object itself. The tag for all such objects is the generic lrecord (Lisp_Type_Record) tag. The first bytes of the object’s structure are an integer (actually a char) characterising the object’s type and some flags, in particular the mark bit used for garbage collection. A structure describing the type is accessible thru the lrecord_implementation_table indexed with said integer. This structure includes the method pointers and a pointer to a string naming the type.
malloc()
ed. These are
called lcrecords. All other types are in this category. Adding a
new type to this category is comparatively easy, and all types added
since 19.8 (when the current allocation scheme was devised, by Richard
Mlynarik), with the exception of the character type, have been in this
category.
Note that bit vectors are a bit of a special case. They are
simple lrecords as in category (b), but are individually malloc()
ed
like vectors. You can basically view them as exactly like vectors
except that their type is stored in lrecord fashion rather than
in directly-tagged fashion.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Garbage collection is simple in theory but tricky to implement. Emacs Lisp uses the oldest garbage collection method, called mark and sweep. Garbage collection begins by starting with all accessible locations (i.e. all variables and other slots where Lisp objects might occur) and recursively traversing all objects accessible from those slots, marking each one that is found. We then go through all of memory and free each object that is not marked, and unmarking each object that is marked. Note that “all of memory” means all currently allocated objects. Traversing all these objects means traversing all frob blocks, all vectors (which are chained in one big list), and all lcrecords (which are likewise chained).
Garbage collection can be invoked explicitly by calling
garbage-collect
but is also called automatically by eval
,
once a certain amount of memory has been allocated since the last
garbage collection (according to gc-cons-threshold
).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
GCPRO
ingGCPRO
ing is one of the ugliest and trickiest parts of Emacs
internals. The basic idea is that whenever garbage collection
occurs, all in-use objects must be reachable somehow or
other from one of the roots of accessibility. The roots
of accessibility are:
staticpro()
d or
staticpro_nodump()
ed. This is used for any global C variables
that hold Lisp objects. A call to staticpro()
happens implicitly
as a result of any symbols declared with defsymbol()
and any
variables declared with DEFVAR_FOO()
. You need to explicitly
call staticpro()
(in the vars_of_foo()
method of a module)
for other global C variables holding Lisp objects. (This typically
includes internal lists and such things.). Use
staticpro_nodump()
only in the rare cases when you do not want
the pointed variable to be saved at dump time but rather recompute it at
startup.
Note that obarray
is one of the staticpro()
d things.
Therefore, all functions and variables get marked through this.
specpdl
stack.
GCPRO
.
Marking with GCPRO
is necessary because some C functions (quite
a lot, in fact), allocate objects during their operation. Quite
frequently, there will be no other pointer to the object while the
function is running, and if a garbage collection occurs and the object
needs to be referenced again, bad things will happen. The solution is
to mark those references with GCPRO
. Note that it is a
reference that is marked with GCPRO
, not an object. If
you declare a Lisp_Object
variable, assign to it, GCPRO
it, and then assign to it again, the first object assigned is not
protected, while the second object is protected. Unfortunately
GCPRO
ing is easy to forget, and there is basically no way around
this problem. Here are some rules, though:
Feval
, or
whenever a QUIT
can occur where execution can continue past
this. (Remember, this is almost anywhere.) Note that Fsignal
can
GC, and it can return (even though it normally doesn’t). This means
that you must GCPRO
before calling most of the error functions,
including the ‘CONCHECK’ family of macros, if references occur
after the call.
UNGCPRO
anything that’s GCPRO
ed, and you
must not UNGCPRO
if you haven’t GCPRO
ed. Getting
either of these wrong will lead to crashes, often in completely random
places unrelated to where the problem lies. There are some functions
(Fsignal
is the canonical example) which may or may not return.
In these cases, the function is responsible for cleaning up the
GCPRO
s if it doesn’t return, so you should treat it as an
ordinary function.
GCPROn
, there have to be declarations of
struct gcpro gcpro1, gcpro2, ..., gcpron
.
GCPRO
s
are chained through the struct gcpro
local variables, with the
variable ‘gcprolist’ pointing to the head of the list and the nth
local gcpro
variable pointing to the first gcpro
variable
in the next enclosing stack frame. Each GCPRO
ed thing is an
lvalue, and the struct gcpro
local variable contains a pointer to
this lvalue. This is why things will mess up badly if you don’t pair up
the GCPRO
s and UNGCPRO
s—you will end up with
gcprolist
s containing pointers to struct gcpro
s or local
Lisp_Object
variables in no-longer-active stack frames.
struct gcpro
to
protect a contiguous array of any number of values, rather than
just a single lvalue. To effect this, call GCPROn
as usual on
the first object in the array and then set gcpron.nvars
.
XSTRING_DATA()
is liable to change at any
time, and you should never keep it around past any function call, or
pass it as an argument to any function that might cause a garbage
collection. This is why a number of functions accept either a
“non-relocatable” char *
pointer or a relocatable Lisp string,
and only access the Lisp string’s data at the very last minute. In some
cases, you may end up having to alloca()
some space and copy the
string’s data into it.
GCPRO
’s, use NGCPROn
(along with struct gcpro ngcpro1, ngcpro2
, etc.), NNGCPROn
,
etc. This avoids compiler warnings about shadowed locals.
GCPRO
s
rather than too few. The extra cycles spent on this are
almost never going to make a whit of difference in the
speed of anything.
GCPRO
s.
That is, you should not have to explicitly GCPRO
any Lisp objects
that are passed in as parameters.
One exception from this rule is if you ever plan to change the parameter
value, and store a new object in it. In that case, you must
GCPRO
the parameter, because otherwise the new object will not be
protected.
So, if you create any Lisp objects (remember, this happens in all sorts
of circumstances, e.g. with Fcons()
, etc.), you are responsible
for GCPRO
ing them, unless you are absolutely sure that
there’s no possibility that a garbage-collection can occur while you
need to use the object. Even then, consider GCPRO
ing.
GCPRO
, you should GCPRO
.
GCPRO
ing something that is uninitialized. If you have
any shade of doubt about this, initialize all your variables to Qnil
.
Fcons()
in the argument to
another function. By the “caller protects” law, you should be
GCPRO
ing the newly-created cons, but you aren’t. A certain
number of functions that are commonly called on freshly created stuff
(e.g. nconc2()
, Fsignal()
), break the “caller protects”
law and go ahead and GCPRO
their arguments so as to simplify
things, but make sure and check if it’s OK whenever doing something like
this.
GCPRO
! Bugs resulting from insufficient
GCPRO
ing are intermittent and extremely difficult to track down,
often showing up in crashes inside of garbage-collect
or in
weirdly corrupted objects or even in incorrect values in a totally
different section of code.
If you don’t understand whether to GCPRO
in a particular
instance, ask on the mailing lists. A general hint is that prog1
is the canonical example.
Given the extremely error-prone nature of the GCPRO
scheme, and
the difficulties in tracking down, it should be considered a deficiency
in the XEmacs code. A solution to this problem would involve
implementing so-called conservative garbage collection for the C
stack. That involves looking through all of stack memory and treating
anything that looks like a reference to an object as a reference. This
will result in a few objects not getting collected when they should, but
it obviates the need for GCPRO
ing, and allows garbage collection
to happen at any point at all, such as during object allocation.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
20.4.1 Invocation | ||
20.4.2 garbage_collect_1 | ||
20.4.3 mark_object | ||
20.4.4 gc_sweep | ||
20.4.5 sweep_lcrecords_1 | ||
20.4.6 compact_string_chars | ||
20.4.7 sweep_strings | ||
20.4.8 sweep_bit_vectors_1 |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The first thing that anyone should know about garbage collection is: when and how the garbage collector is invoked. One might think that this could happen every time new memory is allocated, e.g. new objects are created, but this is not the case. Instead, we have the following situation:
The entry point of any process of garbage collection is an invocation
of the function garbage_collect_1
in file alloc.c
. The
invocation can occur explicitly by calling the function
Fgarbage_collect
(in addition this function provides information
about the freed memory), or can occur implicitly in four different
situations:
main_1
in file emacs.c
. This function is called
at each startup of xemacs. The garbage collection is invoked after all
initial creations are completed, but only if a special internal error
checking-constant ERROR_CHECK_GC
is defined.
disksave_object_finalization
in file
alloc.c
. The only purpose of this function is to clear the
objects from memory which need not be stored with xemacs when we dump out
an executable. This is only done by Fdump_emacs
or by
Fdump_emacs_data
respectively (both in emacs.c
). The
actual clearing is accomplished by making these objects unreachable and
starting a garbage collection. The function is only used while building
xemacs.
Feval / eval
in file eval.c
. Each time the
well known and often used function eval is called to evaluate a form,
one of the first things that could happen, is a potential call of
garbage_collect_1
. There exist three global variables,
consing_since_gc
(counts the created cons-cells since the last
garbage collection), gc_cons_threshold
(a specified threshold
after which a garbage collection occurs) and always_gc
. If
always_gc
is set or if the threshold is exceeded, the garbage
collection will start.
Ffuncall / funcall
in file eval.c
. This
function evaluates calls of elisp functions and works according to
Feval
.
The upshot is that garbage collection can basically occur everywhere
Feval
, respectively Ffuncall
, is used - either directly or
through another function. Since calls to these two functions are hidden
in various other functions, many calls to garbage_collect_1
are
not obviously foreseeable, and therefore unexpected. Instances where
they are used that are worth remembering are various elisp commands, as
for example or
, and
, if
, cond
, while
,
setq
, etc., miscellaneous gui_item_...
functions,
everything related to eval
(Feval_buffer
, call0
,
...) and inside Fsignal
. The latter is used to handle signals, as
for example the ones raised by every QUIT
-macro triggered after
pressing Ctrl-g.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
garbage_collect_1
We can now describe exactly what happens after the invocation takes place.
gc_in_progress
), when
the garbage collection is somehow forbidden
(gc_currently_forbidden
), when we are currently displaying something
(in_display
) or when we are preparing for the armageddon of the
whole system (preparing_for_armageddon
).
pre_gc_cursor
and cursor_changed
take
care of that.
gc_currently_forbidden
must be restored after
the garbage collection, no matter what happens during the process. We
accomplish this by record_unwind_protect
ing the suitable function
restore_gc_inhibit
together with the current value of
gc_currently_forbidden
.
gc_in_progress
is set.
clear_event_resource
) and for specifiers
(cleanup_specifiers
).
mark_object
is called for each root individually to go out from
there to mark all reachable objects. All roots that are traversed are
shown in their processed order:
staticpro
in the dynarr staticpros
.
See section Adding Global Lisp Variables.
gcprolist
.
See section GCPRO
ing.
symbol
and old
values old_values
) that are bound during the evaluation by the Lisp
engine. They are stored in specbinding
structs pushed on a stack
called specpdl
.
See section Dynamic Binding; The specbinding Stack; Unwind-Protects.
catchtag
inserted in the list
catchlist
. Their tag (tag
) and value (val
fields
are freshly created objects and therefore have to be marked.
See section Catch and Throw.
backtrace
on the call stack of the Lisp engine (backtrace_list
). The unique
parts that have to be marked are the fields for each function
(function
) and all their arguments (args
).
See section Evaluation.
mark_redisplay
(in
redisplay.c
).
mark_profiling_info
make-hash-table
.
Because there are complicated dependency rules about when and what to
mark while processing weak hash tables, the standard marker
method is only active if it is marking non-weak hash tables. As soon as
a weak component is in the table, the hash table entries are ignored
while marking. Instead their marking is done each separately by the
function finish_marking_weak_hash_tables
. This function iterates
over each hash table entry hentries
for each weak hash table in
Vall_weak_hash_tables
. Depending on the type of a table, the
appropriate action is performed.
If a table is acting as HASH_TABLE_KEY_WEAK
, and a key already marked,
everything reachable from the value
component is marked. If it is
acting as a HASH_TABLE_VALUE_WEAK
and the value component is
already marked, the marking starts beginning only from the
key
component.
If it is a HASH_TABLE_KEY_CAR_WEAK
and the car
of the key entry is already marked, we mark both the key
and
value
components.
Finally, if the table is of the type HASH_TABLE_VALUE_CAR_WEAK
and the car of the value components is already marked, again both the
key
and the value
components get marked.
Again, there are lists with comparable properties called weak
lists. There exist different peculiarities of their types called
simple
, assoc
, key-assoc
and
value-assoc
. You can find further details about them in the
description to the function make-weak-list
. The scheme of their
marking is similar: all weak lists are listed in Qall_weak_lists
,
therefore we iterate over them. The marking is advanced until we hit an
already marked pair. Then we know that during a former run all
the rest has been marked completely. Again, depending on the special
type of the weak list, our jobs differ. If it is a WEAK_LIST_SIMPLE
and the elem is marked, we mark the cons
part. If it is a
WEAK_LIST_ASSOC
and not a pair or a pair with both marked car and
cdr, we mark the cons
and the elem
. If it is a
WEAK_LIST_KEY_ASSOC
and not a pair or a pair with a marked car of
the elem, we mark the cons
and the elem
. Finally, if it is
a WEAK_LIST_VALUE_ASSOC
and not a pair or a pair with a marked
cdr of the elem, we mark both the cons
and the elem
.
Since, by marking objects in reach from weak hash tables and weak lists, other objects could get marked, this perhaps implies further marking of other weak objects, both finishing functions are redone as long as yet unmarked objects get freshly marked.
The function prune_weak_hash_tables
does the job for weak hash
tables. Totally unmarked hash tables are removed from the list
Vall_weak_hash_tables
. The other ones are treated more carefully
by scanning over all entries and removing one as soon as one of
the components key
and value
is unmarked.
The same idea applies to the weak lists. It is accomplished by
prune_weak_lists
: An unmarked list is pruned from
Vall_weak_lists
immediately. A marked list is treated more
carefully by going over it and removing just the unmarked pairs.
prune_specifiers
checks all listed specifiers held
in Vall_specifiers
and removes the ones from the lists that are
unmarked.
Vall_syntax_tables
. The function prune_syntax_tables
walks
through it and unlinks the tables that are unmarked.
gc_sweep
which holds the predominance.
consing_since_gc
- the counter of the created cells since
the last garbage collection - is set back to 0, and
gc_in_progress
is not true
anymore.
gc_inhibit
is restored to the former value by
unwinding the stack.
breathing_space
. If nothing more is left, we create a new reserve
and exit.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
mark_object
The first thing that is checked while marking an object is whether the
object is a real Lisp object Lisp_Type_Record
or just a fixnum
or a character. Fixnums and characters are the only two types that are
stored directly - without another level of indirection, and therefore they
don’t have to be marked and collected.
See section How Lisp Objects Are Represented in C.
The second case is the one we have to handle. It is the one when we are
dealing with a pointer to a Lisp object. But, there exist also three
possibilities, that prevent us from doing anything while marking: The
object is read only which prevents it from being garbage collected,
i.e. marked (C_READONLY_RECORD_HEADER
). The object in question is
already marked, and need not be marked for the second time (checked by
MARKED_RECORD_HEADER_P
). If it is a special, unmarkable object
(UNMARKABLE_RECORD_HEADER_P
, apparently, these are objects that
sit in some const space, and can therefore not be marked, see
this_one_is_unmarkable
in alloc.c
).
Now, the actual marking is feasible. We do so by once using the macro
MARK_RECORD_HEADER
to mark the object itself (actually the
special flag in the lrecord header), and calling its special marker
“method” marker
if available. The marker method marks every
other object that is in reach from our current object. Note, that these
marker methods should not call mark_object
recursively, but
instead should return the next object from where further marking has to
be performed.
In case another object was returned, as mentioned before, we reiterate
the whole mark_object
process beginning with this next object.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
gc_sweep
The job of this function is to free all unmarked records from memory. As we know, there are different types of objects implemented and managed, and consequently different ways to free them from memory. See section Introduction to Allocation.
We start with all objects stored through lcrecords
. All
bulkier objects are allocated and handled using that scheme of
lcrecords
. Each object is malloc
ed separately
instead of placing it in one of the contiguous frob blocks. All types
that are currently stored
using lcrecords
’s alloc_lcrecord
and
make_lcrecord_list
are the types: vectors, buffers,
char-table, char-table-entry, console, weak-list, database, device,
ldap, hash-table, command-builder, extent-auxiliary, extent-info, face,
coding-system, frame, image-instance, glyph, popup-data, gui-item,
keymap, charset, color_instance, font_instance, opaque, opaque-list,
process, range-table, specifier, symbol-value-buffer-local,
symbol-value-lisp-magic, symbol-value-varalias, toolbar-button,
tooltalk-message, tooltalk-pattern, window, and window-configuration. We
take care of them in the fist place
in order to be able to handle and to finalize items stored in them more
easily. The function sweep_lcrecords_1
as described below is
doing the whole job for us.
For a description about the internals: See section lrecords.
Our next candidates are the other objects that behave quite differently
than everything else: the strings. They consists of two parts, a
fixed-size portion (struct Lisp_String
) holding the string’s
length, its property list and a pointer to the second part, and the
actual string data, which is stored in string-chars blocks comparable to
frob blocks. In this block, the data is not only freed, but also a
compression of holes is made, i.e. all strings are relocated together.
See section String. This compacting phase is performed by the function
compact_string_chars
, the actual sweeping by the function
sweep_strings
is described below.
After that, the other types are swept step by step using functions
sweep_conses
, sweep_bit_vectors_1
,
sweep_compiled_functions
, sweep_floats
,
sweep_symbols
, sweep_extents
, sweep_markers
and
sweep_extents
. They are the fixed-size types cons, floats,
compiled-functions, symbol, marker, extent, and event stored in
so-called “frob blocks”, and therefore we can basically do the same on
every type objects, using the same macros, especially defined only to
handle everything with respect to fixed-size blocks. The only fixed-size
type that is not handled here are the fixed-size portion of strings,
because we took special care of them earlier.
The only big exceptions are bit vectors stored differently and
therefore treated differently by the function sweep_bit_vectors_1
described later.
At first, we need some brief information about how
these fixed-size types are managed in general, in order to understand
how the sweeping is done. They have all a fixed size, and are therefore
stored in big blocks of memory - allocated at once - that can hold a
certain amount of objects of one type. The macro
DECLARE_FIXED_TYPE_ALLOC
creates the suitable structures for
every type. More precisely, we have the block struct
(holding a pointer to the previous block prev
and the
objects in block[]
), a pointer to current block
(current_..._block)
) and its last index
(current_..._block_index
), and a pointer to the free list that
will be created. Also a macro FIXED_TYPE_FROM_BLOCK
plus some
related macros exists that are used to obtain a new object, either from
the free list ALLOCATE_FIXED_TYPE_1
if there is an unused object
of that type stored or by allocating a completely new block using
ALLOCATE_FIXED_TYPE_FROM_BLOCK
.
The rest works as follows: all of them define a
macro UNMARK_...
that is used to unmark the object. They define a
macro ADDITIONAL_FREE_...
that defines additional work that has
to be done when converting an object from in use to not in use (so far,
only markers use it in order to unchain them). Then, they all call
the macro SWEEP_FIXED_TYPE_BLOCK
instantiated with their type name
and their struct name.
This call in particular does the following: we go over all blocks
starting with the current moving towards the oldest.
For each block, we look at every object in it. If the object already
freed (checked with FREE_STRUCT_P
using the first pointer of the
object), or if it is
set to read only (C_READONLY_RECORD_HEADER_P
, nothing must be
done. If it is unmarked (checked with MARKED_RECORD_HEADER_P
), it
is put in the free list and set free (using the macro
FREE_FIXED_TYPE
, otherwise it stays in the block, but is unmarked
(by UNMARK_...
). While going through one block, we note if the
whole block is empty. If so, the whole block is freed (using
xfree
) and the free list state is set to the state it had before
handling this block.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
sweep_lcrecords_1
After nullifying the complete lcrecord statistics, we go over all
lcrecords two separate times. They are all chained together in a list with
a head called all_lcrecords
.
The first loop calls for each object its finalizer
method, but only
in the case that it is not read only
(C_READONLY_RECORD_HEADER_P)
, it is not already marked
(MARKED_RECORD_HEADER_P
), it is not already in a free list (list of
freed objects, field free
) and finally it owns a finalizer
method.
The second loop actually frees the appropriate objects again by iterating
through the whole list. In case an object is read only or marked, it
has to persist, otherwise it is manually freed by calling
xfree
. During this loop, the lcrecord statistics are kept up to
date by calling tick_lcrecord_stats
with the right arguments,
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
compact_string_chars
The purpose of this function is to compact all the data parts of the
strings that are held in so-called string_chars_block
, i.e. the
strings that do not exceed a certain maximal length.
The procedure with which this is done is as follows. We are keeping two
positions in the string_chars_block
s using two pointer/integer
pairs, namely from_sb
/from_pos
and
to_sb
/to_pos
. They stand for the actual positions, from
where to where, to copy the actually handled string.
While going over all chained string_char_block
s and their held
strings, staring at first_string_chars_block
, both pointers
are advanced and eventually a string is copied from from_sb
to
to_sb
, depending on the status of the pointed at strings.
More precisely, we can distinguish between the following actions.
from_sb
’s position could be marked as free, which
is indicated by an invalid pointer to the pointer that should point back
to the fixed size string object, and which is checked by
FREE_STRUCT_P
. In this case, the from_sb
/from_pos
is advanced to the next string, and nothing has to be copied.
from_sb
/from_pos
pair as described above.
to_sb
-block, we advance
this pointer to the beginning of the next block before copying. In case the
from and to positions are different, we perform the
actual copying using the library function memmove
.
After compacting, the pointer to the current
string_chars_block
, sitting in current_string_chars_block
,
is reset on the last block to which we moved a string,
i.e. to_block
, and all remaining blocks (we know that they just
carry garbage) are explicitly xfree
d.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
sweep_strings
The sweeping for the fixed sized string objects is essentially exactly
the same as it is for all other fixed size types. As before, the freeing
into the suitable free list is done by using the macro
SWEEP_FIXED_SIZE_BLOCK
after defining the right macros
UNMARK_string
and ADDITIONAL_FREE_string
. These two
definitions are a little bit special compared to the ones used
for the other fixed size types.
UNMARK_string
is defined the same way except some additional code
used for updating the bookkeeping information.
For strings, ADDITIONAL_FREE_string
has to do something in
addition: in case, the string was not allocated in a
string_chars_block
because it exceeded the maximal length, and
therefore it was malloc
ed separately, we know also xfree
it explicitly.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
sweep_bit_vectors_1
Bit vectors are also one of the rare types that are malloc
ed
individually. Consequently, while sweeping, all further needless
bit vectors must be freed by hand. This is done, as one might imagine,
the expected way: since they are all registered in a list called
all_bit_vectors
, all elements of that list are traversed,
all unmarked bit vectors are unlinked by calling xfree
and all of
them become unmarked.
In addition, the bookkeeping information used for garbage
collector’s output purposes is updated.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Fixnum and character Lisp objects are created from C integers using the
functions make_fixnum()
and make_char()
. (These are actually
macros on most systems.) These functions basically just do some moving
of bits around, since the integral value of the object is stored
directly in the Lisp_Object
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The uninitialized memory required by a Lisp_Object
of a
particular type is allocated using ALLOCATE_FIXED_TYPE()
. This
only occurs inside of the lowest-level object-creating functions in
‘alloc.c’: Fcons()
, make_float()
,
Fmake_byte_code()
, Fmake_symbol()
,
allocate_extent()
, allocate_event()
,
Fmake_marker()
, and make_uninit_string()
. The idea is
that, for each type, there are a number of frob blocks (each 2K in
size); each frob block is divided up into object-sized chunks. Each
frob block will have some of these chunks that are currently assigned
to objects, and perhaps some that are free. (If a frob block has
nothing but free chunks, it is freed at the end of the garbage
collection cycle.) The free chunks are stored in a free list, which
is chained by storing a pointer in the first four bytes of the
chunk. (Except for the free chunks at the end of the last frob block,
which are handled using an index which points past the end of the
last-allocated chunk in the last frob block.)
ALLOCATE_FIXED_TYPE()
first tries to retrieve a chunk from the
free list; if that fails, it calls
ALLOCATE_FIXED_TYPE_FROM_BLOCK()
, which looks at the end of the
last frob block for space, and creates a new frob block if there is
none. (There are actually two versions of these macros, one of which
is more defensive but less efficient and is used for error-checking.)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
[see ‘lrecord.h’]
This node needs updating for the “new garbage collection algorithms” (KKCC) and the “incremental” collector.
All lrecords have at the beginning of their structure a struct
lrecord_header
. This just contains a type number and some flags,
including the mark bit. All builtin type numbers are defined as
constants in enum lrecord_type
, to allow the compiler to generate
more efficient code for typeP
. The type number, thru the
lrecord_implementation_table
, gives access to a struct
lrecord_implementation
, which is a structure containing method pointers
and such. There is one of these for each type, and it is a global,
constant, statically-declared structure that is declared in the
DEFINE_*_LISP_OBJECT()
macro.
Frob-block lrecords just have a struct lrecord_header
at their
beginning. lcrecords, however, actually have a
struct old_lcrecord_header
. This, in turn, has a struct
lrecord_header
at its beginning, so sanity is preserved; but it also
has a pointer used to chain all lcrecords together.
lcrecords are now obsolete when using the write-barrier-based collector.
Frob-block objects are created using ALLOC_FROB_BLOCK_LISP_OBJECT()
.
All this does is call ALLOCATE_FIXED_TYPE()
to allocate an
object, and set_lheader_implementation()
to initialize the header.
Normal objects (i.e. lcrecords) are created using
ALLOC_NORMAL_LISP_OBJECT()
, which takes a type name (resolved
internally to a structure named lrecord_foo
for type
foo
). If they are of variable size, however, they are created
with ALLOC_SIZED_LISP_OBJECT()
, which takes a size to allocate
in addition to a type. This basically just malloc()
s the
storage, initializes the struct lcrecord_header
, and chains the
lcrecord onto the head of the list of all lcrecords, which is stored
in the variable all_lcrecords
. The calls to the above
allocation macros generally occur in the lowest-level allocation
function for each lrecord type.
Whenever you create a normal object, you need to call one of the
DEFINE_*_LISP_OBJECT()
macros. This needs to be
specified in a ‘.c’ file, at the top level. What this actually
does is define and initialize the implementation structure for the
lrecord. (And possibly declares a function error_check_foo()
that
implements the XFOO()
macro when error-checking is enabled.) The
arguments to the macros are the actual type name (this is used to
construct the C variable name of the lrecord implementation structure
and related structures using the ‘##’ macro concatenation
operator), a string that names the type on the Lisp level (this may not
be the same as the C type name; typically, the C type name has
underscores, while the Lisp string has dashes), various method pointers,
and the name of the C structure that contains the object. The methods
are used to encapsulate type-specific information about the object, such
as how to print it or mark it for garbage collection, so that it’s easy
to add new object types without having to add a specific case for each
new type in a bunch of different places.
The various macros for defining Lisp objects are as follows:
DEFINE_*_LISP_OBJECT
is for objects with constant size. (Either
DEFINE_DUMPABLE_LISP_OBJECT
for objects that can be saved in a
dumped executable, or DEFINE_NODUMP_LISP_OBJECT
for objects
that cannot be saved – e.g. that contain pointers to non-persistent
external objects such as window-system windows.)
DEFINE_*_SIZABLE_LISP_OBJECT
is for objects whose size varies.
This includes some simple types such as vectors, bit vectors and
opaque objects, as well complex types, especially types such as
specifiers, lstreams or coding systems that have subtypes and include
subtype-specific data attached to the end of the structure.
Variable-size objects have an extra method that returns the size of
the object. This is not used at allocation (rather, the size is
specified in the call to the allocation macro), but is used for
operations such as copying a Lisp object, as well as for keeping
allocation statistics.
DEFINE_*_FROB_BLOCK_LISP_OBJECT
is for objects that are
allocated in large blocks (“frob blocks”), which are parceled up
individually. Such objects need special handling in ‘alloc.c’.
This does not apply to NEW_GC, because it does this automatically.
DEFINE_*_INTERNAL_LISP_OBJECT
is for “internal” objects that
should never be visible on the Lisp level. This is a shorthand for
the most common type of internal objects, which have no equal or hash
method (since they generally won’t appear in hash tables), no
finalizer and internal_object_printer()
as their print method
(which prints that the object is internal and shouldn’t be visible
externally). For internal objects needing a finalizer, equal or hash
method, or wanting to customize the print method, use the normal
DEFINE_*_LISP_OBJECT
mechanism for defining these objects.
DEFINE_*_GENERAL_LISP_OBJECT
is for objects that need to
provide one of the less common methods that are omitted on most
objects. These methods include the methods supporting the unified
property interface using get
, put
, remprop
and
object-plist
, and (for dumpable objects only) the
disksaver
method.
DEFINE_MODULE_*
is for objects defined in an external module.
MAKE_LISP_OBJECT
and MAKE_MODULE_LISP_OBJECT
are what
underlies all of these; they define a structure containing pointers to
object methods and other info such as the size of the structure
containing the object.
For the purpose of keeping allocation statistics, the allocation
engine keeps a list of all the different types that exist. Note that,
since DEFINE_*_LISP_OBJECT()
is a macro that is
specified at top-level, there is no way for it to initialize the
global data structures containing type information, like
lrecord_implementations_table
. For this reason a call to
INIT_LISP_OBJECT()
must be added to the same source
file containing DEFINE_*_LISP_OBJECT()
, but instead of
to the top level, to one of the init functions, typically
syms_of_foo.c
. INIT_LISP_OBJECT()
must
be called before an object of this type is used.
The type number is also used to index into an array holding the number
of objects of each type and the total memory allocated for objects of
that type. The statistics in this array are computed during the sweep
stage. These statistics are returned by the call to
garbage-collect
.
Note that for every type defined with a DEFINE_*_LISP_OBJECT()
macro, there needs to be a DECLARE_LISP_OBJECT()
somewhere in a
‘.h’ file, and this ‘.h’ file needs to be included by
‘inline.c’.
Furthermore, there should generally be a set of XFOOBAR()
,
FOOBARP()
, etc. macros in a ‘.h’ (or occasionally
‘.c’) file. To create one of these, copy an existing model and
modify as necessary.
Please note: If you define an lrecord in an external
dynamically-loaded module, you must use
DECLARE_MODULE_LISP_OBJECT()
,
DEFINE_MODULE_*_LISP_OBJECT()
, and
INIT_MODULE_LISP_OBJECT()
instead of the non-MODULE
forms. These macros will dynamically add new type numbers to the
global enum that records them, whereas the non-MODULE forms assume
that the programmer has already inserted the correct type numbers into
the enum’s code at compile-time.
The various methods in the lrecord implementation structure are:
mark_object()
function), which is
used to mark an object. All Lisp objects that are contained within the
object need to be marked by applying this function to them. The mark
method should also return a Lisp object, which should be either nil
or
an object to mark. (This can be used in lieu of calling
mark_object()
on the object, to reduce the recursion depth, and
consequently should be the most heavily nested sub-object, such as a
long list.)
Please note: When the mark method is called, garbage collection is in progress, and special precautions need to be taken when accessing objects; see section (B) above.
If your mark method does not need to do anything, it can be
NULL
.
princ
, prin1
, or the like is
called. It is passed the object, a stream to which the output is to be
directed, and an escapeflag
which indicates whether the object’s
printed representation should be escaped so that it is
readable. (This corresponds to the difference between princ
and
prin1
.) Basically, escaped means that strings will have
quotes around them and confusing characters in the strings such as
quotes, backslashes, and newlines will be backslashed; and that special
care will be taken to make symbols print in a readable fashion
(e.g. symbols that look like numbers will be backslashed). Other
readable objects should perhaps pass escapeflag
on when
sub-objects are printed, so that readability is preserved when necessary
(or if not, always pass in a 1 for escapeflag
). Non-readable
objects should in general ignore escapeflag
, except that some use
it as an indication that more verbose output should be given.
Sub-objects are printed using print_internal()
, which takes
exactly the same arguments as are passed to the print method.
Literal C strings should be printed using write_cistring()
,
or write_string_1()
for non-null-terminated strings.
Functions that do not have a readable representation should check the
print_readably
flag and signal an error if it is set.
If you specify NULL for the print method, the
default_object_printer()
will be used.
malloc()
ed memory associated with the object, releasing any
operating-system and window-system resources associated with the object
(e.g. pixmaps, fonts), etc.
The finalize method can be NULL if nothing needs to be done.
Finalize methods should, as a rule, set to zero any pointers after they’ve been freed, and check to make sure pointers are not zero before freeing. Although I’m pretty sure that finalize methods are not called twice on the same object, we’ve gotten nastily burned in some cases by not doing this.
WARNING #1: The finalize method is only called for
normal objects, not for frob-block objects. If you need a
finalize method for frob-block objects, you have to stick
it in the ADDITIONAL_FREE_foo()
macro in ‘alloc.c’.
WARNING #2: Things are in an extremely bizarre state
when ADDITIONAL_FREE_foo()
is called, so you have to
be incredibly careful when writing one of these functions.
See the comment in gc_sweep()
. If you ever have to add
one of these, consider using an lcrecord or dealing with
the problem in a different fashion.
equal
is called. It should compare the contents of the
objects in some reasonable fashion. It is passed the two objects and a
depth value, which is used to catch circular objects. To compare
sub-Lisp-objects, call internal_equal()
and bump the depth value
by one. If this value gets too high, a circular-object
error
will be signaled.
If this is NULL, objects are equal
only when they are eq
,
i.e. identical.
equal
. The rule here is that if two objects are
equal
, they must hash to the same value; i.e. your hash
function should use some subset of the sub-fields of the object that are
compared in the “equal” method. If you specify this method as
NULL
, the object’s pointer will be used as the hash, which will
fail if the object has an equal
method, so don’t do this.
To hash a sub-Lisp-object, call internal_hash()
. Bump the
depth by one, just like in the “equal” method.
To convert a Lisp object directly into a hash value (using
its pointer), use LISP_HASH()
. This is what happens when
the hash method is NULL.
To hash two or more values together into a single value, use
HASH2()
, HASH3()
, HASH4()
, etc.
get
, put
, remprop
, and object-plist
,
respectively are called on the object. If you create one of these
objects, you have to use a different macro to define them,
i.e. DEFINE_*_GENERAL_LISP_OBJECT()
.
DEFINE_*_SIZABLE_*_LISP_OBJECT
macro.)
This should simply return the object’s size in bytes, exactly as you
might expect. For an example, see the methods for lstreams and opaques.
It seems that even non-dumpable objects may be around at dump time, and a disksaver may be provided. (In fact, the only object currently with a disksaver, lstream, is non-dumpable.)
Objects rarely need to provide this method; most of the time it will
be NULL. If you want to provide this method, you have to use the
DEFINE_*_GENERAL_LISP_OBJECT()
macro to define your object.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Memory that you want to allocate directly should be allocated using
xmalloc()
rather than malloc()
. This implements
error-checking on the return value, and once upon a time did some more
vital stuff (i.e. BLOCK_INPUT
, which is no longer necessary).
Free using xfree()
, and realloc using xrealloc()
. Note
that xmalloc()
will do a non-local exit if the memory can’t be
allocated. (Many functions, however, do not expect this, and thus XEmacs
will likely crash if this happens. This is a bug. If you can,
you should strive to make your function handle this OK. However, it’s
difficult in the general circumstance, perhaps requiring extra
unwind-protects and such.)
Note that XEmacs provides two separate replacements for the standard
malloc()
library function. These are called old GNU malloc
(‘malloc.c’) and new GNU malloc (‘gmalloc.c’),
respectively. New GNU malloc is better in pretty much every way than
old GNU malloc, and should be used if possible. (It used to be that on
some systems, the old one worked but the new one didn’t. I think this
was due specifically to a bug in SunOS, which the new one now works
around; so I don’t think the old one ever has to be used any more.) The
primary difference between both of these mallocs and the standard system
malloc is that they are much faster, at the expense of increased space.
The basic idea is that memory is allocated in fixed chunks of powers of
two. This allows for basically constant malloc time, since the various
chunks can just be kept on a number of free lists. (The standard system
malloc typically allocates arbitrary-sized chunks and has to spend some
time, sometimes a significant amount of time, walking the heap looking
for a free block to use and cleaning things up.) The new GNU malloc
improves on things by allocating large objects in chunks of 4096 bytes
rather than in ever larger powers of two, which results in ever larger
wastage. There is a slight speed loss here, but it’s of doubtful
significance.
NOTE: Apparently there is a third-generation GNU malloc that is significantly better than the new GNU malloc, and should probably be included in XEmacs.
There is also the relocating allocator, ‘ralloc.c’. This actually
moves blocks of memory around so that the sbrk()
pointer shrunk
and virtual memory released back to the system. On some systems,
this is a big win. On all systems, it causes a noticeable (and
sometimes huge) speed penalty, so I turn it off by default.
‘ralloc.c’ only works with the new GNU malloc in ‘gmalloc.c’.
There are also two versions of ‘ralloc.c’, one that uses mmap()
rather than block copies to move data around. This purports to
be faster, although that depends on the amount of data that would
have had to be block copied and the system-call overhead for
mmap()
. I don’t know exactly how this works, except that the
relocating-allocation routines are pretty much used only for
the memory allocated for a buffer, which is the biggest consumer
of space, esp. of space that may get freed later.
Note that the GNU mallocs have some “memory warning” facilities. XEmacs taps into them and issues a warning through the standard warning system, when memory gets to 75%, 85%, and 95% full. (On some systems, the memory warnings are not functional.)
Allocated memory that is going to be used to make a Lisp object
is created using allocate_lisp_storage()
. This just calls
xmalloc()
. It used to verify that the pointer to the memory can
fit into a Lisp word, before the current Lisp object representation was
introduced. allocate_lisp_storage()
is called by
alloc_lcrecord()
, ALLOCATE_FIXED_TYPE()
, and the vector
and bit-vector creation routines. These routines also call
INCREMENT_CONS_COUNTER()
at the appropriate times; this keeps
statistics on how much memory is allocated, so that garbage-collection
can be invoked when the threshold is reached.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Conses are allocated in standard frob blocks. The only thing to
note is that conses can be explicitly freed using free_cons()
and associated functions free_list()
and free_alist()
. This
immediately puts the conses onto the cons free list, and decrements
the statistics on memory allocation appropriately. This is used
to good effect by some extremely commonly-used code, to avoid
generating extra objects and thereby triggering GC sooner.
However, you have to be extremely careful when doing this.
If you mess this up, you will get BADLY BURNED, and it has happened
before.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As mentioned above, each vector is malloc()
ed individually, and
all are threaded through the variable all_vectors
. Vectors are
marked strangely during garbage collection, by kludging the size field.
Note that the struct Lisp_Vector
is declared with its
contents
field being a stretchy array of one element. It
is actually malloc()
ed with the right size, however, and access
to any element through the contents
array works fine.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Bit vectors work exactly like vectors, except for more complicated code to access an individual bit, and except for the fact that bit vectors are lrecords while vectors are not. (The only difference here is that there’s an lrecord implementation pointer at the beginning and the tag field in bit vector Lisp words is “lrecord” rather than “vector”.)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Symbols are also allocated in frob blocks. Symbols in the awful
horrible obarray structure are chained through their next
field.
Remember that intern
looks up a symbol in an obarray, creating
one if necessary.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Markers are allocated in frob blocks, as usual. They are kept
in a buffer unordered, but in a doubly-linked list so that they
can easily be removed. (Formerly this was a singly-linked list,
but in some cases garbage collection took an extraordinarily
long time due to the O(N^2) time required to remove lots of
markers from a buffer.) Markers are removed from a buffer in
the finalize stage, in ADDITIONAL_FREE_marker()
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As mentioned above, strings are a special case. A string is logically
two parts, a fixed-size object (containing the length, property list,
and a pointer to the actual data), and the actual data in the string.
The fixed-size object is a struct Lisp_String
and is allocated in
frob blocks, as usual. The actual data is stored in special
string-chars blocks, which are 8K blocks of memory.
Currently-allocated strings are simply laid end to end in these
string-chars blocks, with a pointer back to the struct Lisp_String
stored before each string in the string-chars block. When a new string
needs to be allocated, the remaining space at the end of the last
string-chars block is used if there’s enough, and a new string-chars
block is created otherwise.
There are never any holes in the string-chars blocks due to the string
compaction and relocation that happens at the end of garbage collection.
During the sweep stage of garbage collection, when objects are
reclaimed, the garbage collector goes through all string-chars blocks,
looking for unused strings. Each chunk of string data is preceded by a
pointer to the corresponding struct Lisp_String
, which indicates
both whether the string is used and how big the string is, i.e. how to
get to the next chunk of string data. Holes are compressed by
block-copying the next string into the empty space and relocating the
pointer stored in the corresponding struct Lisp_String
.
This means you have to be careful with strings in your code.
See the section above on GCPRO
ing.
Note that there is one situation not handled: a string that is too big
to fit into a string-chars block. Such strings, called big
strings, are all malloc()
ed as their own block. (#### Although it
would make more sense for the threshold for big strings to be somewhat
lower, e.g. 1/2 or 1/4 the size of a string-chars block. It seems that
this was indeed the case formerly—indeed, the threshold was set at
1/8—but Mly forgot about this when rewriting things for 19.8.)
Note also that the string data in string-chars blocks is padded as
necessary so that proper alignment constraints on the struct
Lisp_String
back pointers are maintained.
Finally, strings can be resized. This happens in Mule when a
character is substituted with a different-length character, or during
modeline frobbing. (You could also export this to Lisp, but it’s not
done so currently.) Resizing a string is a potentially tricky process.
If the change is small enough that the padding can absorb it, nothing
other than a simple memory move needs to be done. Keep in mind,
however, that the string can’t shrink too much because the offset to the
next string in the string-chars block is computed by looking at the
length and rounding to the nearest multiple of four or eight. If the
string would shrink or expand beyond the correct padding, new string
data needs to be allocated at the end of the last string-chars block and
the data moved appropriately. This leaves some dead string data, which
is marked by putting a special marker of 0xFFFFFFFF in the struct
Lisp_String
pointer before the data (there’s no real struct
Lisp_String
to point to and relocate), and storing the size of the dead
string data (which would normally be obtained from the now-non-existent
struct Lisp_String
) at the beginning of the dead string data gap.
The string compactor recognizes this special 0xFFFFFFFF marker and
handles it correctly.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Not yet documented.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Aidan Kehoe on December 27, 2016 using texi2html 1.82.