[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The XEmacs C Code is extremely complex and intricate, and there are many rules that are more or less consistently followed throughout the code. Many of these rules are not obvious, so they are explained here. It is of the utmost importance that you follow them. If you don’t, you may get something that appears to work, but which will crash in odd situations, often in code far away from where the actual breakage is.
See also Coding for Mule.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The C code is actually written in a dialect of C called Clean C,
meaning that it can be compiled, warning-free, with either a C or C++
compiler. Coding in Clean C has several advantages over plain
C. C++ compilers are more nit-picking, and a number of coding errors
have been found by compiling with C++. The ability to use both C and
C++ tools means that a greater variety of development tools are
available to the developer. In addition, the ability to overload
operators in C++ means it is possible, for error-checking purposes, to
redefine certain simple types (normally defined as aliases for simple
built-in types such as unsigned char
or long
) as
classes, strictly limiting the permissible operations and catching
illegal implicit casts and such.
XEmacs follows the GNU coding standards, which are documented separately in See (standards)Top section ‘top’ in GNU Coding Standards. This section mainly documents standards that are not included in that document; typically this consists of standards that are specifically relevant to the XEmacs code itself.
First, a recap of the GNU standards:
Now, the XEmacs coding standards:
struct foobar;
go into the “types” section of ‘lisp.h’.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Every module includes ‘<config.h>’ (angle brackets so that ‘--srcdir’ works correctly; ‘config.h’ may or may not be in the same directory as the C sources) and ‘lisp.h’. ‘config.h’ must always be included before any other header files (including system header files) to ensure that certain tricks played by various ‘s/’ and ‘m/’ files work out correctly.
When including header files, always use angle brackets, not double quotes, except when the file to be included is always in the same directory as the including file. If either file is a generated file, then that is not likely to be the case. In order to understand why we have this rule, imagine what happens when you do a build in the source directory using ‘./configure’ and another build in another directory using ‘../work/configure’. There will be two different ‘config.h’ files. Which one will be used if you ‘#include "config.h"’?
Almost every module contains a syms_of_*()
function and a
vars_of_*()
function. The former declares any Lisp primitives
you have defined and defines any symbols you will be using. The latter
declares any global Lisp variables you have added and initializes global
C variables in the module. Important: There are stringent
requirements on exactly what can go into these functions. See the
comment in ‘emacs.c’. The reason for this is to avoid obscure
unwanted interactions during initialization. If you don’t follow these
rules, you’ll be sorry! If you want to do anything that isn’t allowed,
create a complex_vars_of_*()
function for it. Doing this is
tricky, though: you have to make sure your function is called at the
right time so that all the initialization dependencies work out.
Declare each function of these kinds in ‘symsinit.h’. Make sure it’s called in the appropriate place in ‘emacs.c’. You never need to include ‘symsinit.h’ directly, because it is included by ‘lisp.h’.
All global and static variables that are to be modifiable must
be declared uninitialized. This means that you may not use the
“declare with initializer” form for these variables, such as int
some_variable = 0;
. The reason for this has to do with some kludges
done during the dumping process: If possible, the initialized data
segment is re-mapped so that it becomes part of the (unmodifiable) code
segment in the dumped executable. This allows this memory to be shared
among multiple running XEmacs processes. XEmacs is careful to place as
much constant data as possible into initialized variables during the
‘temacs’ phase.
Please note: This kludge only works on a few systems nowadays, and is rapidly becoming irrelevant because most modern operating systems provide copy-on-write semantics. All data is initially shared between processes, and a private copy is automatically made (on a page-by-page basis) when a process first attempts to write to a page of memory.
Formerly, there was a requirement that static variables not be declared
inside of functions. This had to do with another hack along the same
vein as what was just described: old USG systems put statically-declared
variables in the initialized data space, so those header files had a
#define static
declaration. (That way, the data-segment remapping
described above could still work.) This fails badly on static variables
inside of functions, which suddenly become automatic variables;
therefore, you weren’t supposed to have any of them. This awful kludge
has been removed in XEmacs because
Here are things to know when you create a new source file:
#include <config.h>
first. Almost all
‘.c’ files should #include "lisp.h"
second.
‘config.h sheap-adjust.h paths.h Emacs.ad.h’
The basic rule is that you should assume builds using ‘--srcdir’ and the ‘#include <...>’ syntax needs to be used when the to-be-included generated file is in a potentially different directory at compile time. The non-obvious C rule is that ‘#include "..."’ means to search for the included file in the same directory as the including file, not in the current directory. Normally this is not a problem but when building with ‘--srcdir’, ‘make’ will search the ‘VPATH’ for you, while the C compiler knows nothing about it.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Of course the low-level implementation language of XEmacs is C, but much of that uses the Lisp engine to do its work. However, because the code is “inside” of the protective containment shell around the “reactor core,” you’ll see lots of complex “plumbing” needed to do the work and “safety mechanisms,” whose failure results in a meltdown. This section provides a quick overview (or review) of the various components of the implementation of Lisp objects.
Two typographic conventions help to identify C objects that implement Lisp objects. The first is that capitalized identifiers, especially beginning with the letters ‘Q’, ‘V’, ‘F’, and ‘S’, for C variables and functions, and C macros with beginning with the letter ‘X’, are used to implement Lisp. The second is that where Lisp uses the hyphen ‘-’ in symbol names, the corresponding C identifiers use the underscore ‘_’. Of course, since XEmacs Lisp contains interfaces to many external libraries, those external names will follow the coding conventions their authors chose, and may overlap the “XEmacs name space.” However these cases are usually pretty obvious.
All Lisp objects are handled indirectly. The Lisp_Object
type is usually a pointer to a structure, except for a very small number
of types with immediate representations (currently characters and
fixnums). However, these types cannot be directly operated on in C
code, either, so they can also be considered indirect. Types that do
not have an immediate representation always have a C typedef
Lisp_type
for a corresponding structure.
In older code, it was common practice to pass around pointers to
Lisp_type
, but this is now deprecated in favor of using
Lisp_Object
for all function arguments and return values that are
Lisp objects. The Xtype
macro is used to extract the
pointer and cast it to (Lisp_type *)
for the desired type.
Convention: macros whose names begin with ‘X’ operate on
Lisp_Object
s and do no type-checking. Many such macros are type
extractors, but others implement Lisp operations in C (e.g.,
XCAR
implements the Lisp car
function). These are unsafe,
and must only be used where types of all data have already been checked.
Such macros are only applied to Lisp_Object
s. In internal
implementations where the pointer has already been converted, the
structure is operated on directly using the C ->
member access
operator.
The typeP
, CHECK_type
, and
CONCHECK_type
macros are used to test types. The first
returns a Boolean value, and the latter signal errors. (The
‘CONCHECK’ variety allows execution to be CONtinued under some
circumstances, thus the name.) Functions which expect to be passed user
data invariably call ‘CHECK’ macros on arguments.
There are many types of specialized Lisp objects implemented in C, but the most pervasive type is the symbol. Symbols are used as identifiers, variables, and functions.
Convention: Global variables whose names begin with ‘Q’
are constants whose value is a symbol. The name of the variable should
be derived from the name of the symbol using the same rules as for Lisp
primitives. Such variables allow the C code to check whether a
particular Lisp_Object
is equal to a given symbol. Symbols are
Lisp objects, so these variables may be passed to Lisp primitives. (A
tempting alternative to the use of ‘Q...’ variables is to call the
intern
function at initialization in the
vars_of_module
function. But this does not
staticpro
the symbol, which in theory could get uninterned, and
then garbage collected while you’re not looking. You could
staticpro
yourself, but in a production XEmacs intern
and
staticpro
is all that DEFSYMBOL
does, while in a debugging
XEmacs it also does some error-checking, which you normally want.)
Convention: Global variables whose names begin with ‘V’
are variables that contain Lisp objects. The convention here is that
all global variables of type Lisp_Object
begin with ‘V’, and
no others do (not even fixnum and boolean variables that have Lisp
equivalents). Most of the time, these variables have equivalents in
Lisp, which are defined via the ‘DEFVAR’ family of macros, but some
don’t. Since the variable’s value is a Lisp_Object
, it can be
passed to Lisp primitives.
The implementation of Lisp primitives is more complex.
Convention: Global variables with names beginning with ‘S’
contain a structure that allows the Lisp engine to identify and call a C
function. In modern versions of XEmacs, these identifiers are almost
always completely hidden in the DEFUN
and SUBR
macros, but
you will encounter them if you look at very old versions of XEmacs or at
GNU Emacs. Convention: Functions with names beginning with
‘F’ implement Lisp primitives. Of course all their arguments and
their return values must be Lisp_Objects. (This is hidden in the
DEFUN
macro.)
Lisp lists are popular data structures in the C code as well as in
Elisp. There are two sets of macros that iterate over lists.
EXTERNAL_LIST_LOOP_n
should be used when the list has been
supplied by the user, and cannot be trusted to be acyclic and
nil
-terminated. A malformed-list
or circular-list
error
will be generated if the list being iterated over is not entirely
kosher. LIST_LOOP_n
, on the other hand, is faster and less
safe, and can be used only on trusted lists.
Related macros are GET_EXTERNAL_LIST_LENGTH
and
GET_LIST_LENGTH
, which calculate the length of a list, and in the
case of GET_EXTERNAL_LIST_LENGTH
, validating the properness of
the list. The macros EXTERNAL_LIST_LOOP_DELETE_IF
and
LIST_LOOP_DELETE_IF
delete elements from a lisp list satisfying some
predicate.
At the lowest levels, XEmacs makes heavy use of object-oriented techniques to promote code-sharing and uniform interfaces for different devices and platforms. Commonly, but not always, such objects are “wrapped” and exported to Lisp as Lisp objects. Usually they use the internal structures developed for Lisp objects (the ‘lrecord’ structure) in order to take advantage of Lisp memory management. Unfortunately, XEmacs was originally written in C, so these techniques are based on heavy use of C macros.
A module defining a class is likely to use most of the following declarations and macros. In the following, the notation ‘<type>’ will stand for the full name of the class, and will be capitalized in the way normal for its context. The notation ‘<typ>’ will stand for the abbreviated form commonly used in macro names, while ‘ty’ will be used as the typical name for instances of the class. (See the entry for ‘MAYBE_<TY>METH’ below for an example using all three notations.)
In the interface (‘.h’ file), the following declarations are used often. Others may be used in for particular modules. Since they’re quite short in most cases, the definitions are given as well. The generic macros used are defined in ‘lisp.h’ or ‘lrecord.h’.
This refers to the internal structure used by C code. The XEmacs coding style now forbids passing pointers to ‘Lisp_<Type>’ structures into or out of a function; instead, a ‘Lisp_Object’ should be passed or returned (created using ‘wrap_<type>’, if necessary).
Declares a Lisp object for ‘<Type>’, which is the unit of allocation.
Turns a Lisp_Object
into a pointer to ‘struct Lisp_<Type>’.
Turns a pointer to ‘struct Lisp_<Type>’ into a Lisp_Object
.
Tests whether a given Lisp_Object
is of type ‘Lisp_<Type>’.
Returns a C int, not a Lisp Boolean value.
Tests whether a given Lisp_Object
is of type ‘Lisp_<Type>’,
and signals a Lisp error if not. The ‘CHECK’ version of the macro
never returns if the type is wrong, while the ‘CONCHECK’ version
can return if the user catches it in the debugger and explicitly
requests a return.
Return a function pointer for the method for an object TY of class ‘Lisp_<Type>’, or ‘NULL’ if there is none for this type.
Test whether the class that TY is an instance of has the method.
Call the method on ‘args’. ‘args’ must be enclosed in parentheses in the call. It is the programmer’s responsibility to ensure that the method is available. The standard convenience macro ‘MAYBE_<TYP>METH’ is often provided for the common case where a void-returning method of ‘Type’ is called.
Call a void-returning ‘<Type>’ method, if it exists. Note the use of the ‘do ... while (0)’ idiom to give the macro call C statement semantics. The full definition is equally idiomatic:
#define MAYBE_<TYP>METH(ty, m, args) do { \ Lisp_<Type> *maybe_<typ>meth_ty = (ty); \ if (HAS_<TYP>METH_P (maybe_<typ>meth_ty, m)) \ <TYP>METH (maybe_<typ>meth_ty, m, args); \ } while (0) |
The use of macros for invoking an object’s methods makes life a bit difficult for the student or maintainer when browsing the code. In particular, calls are of the form ‘<TYP>METH (ty, some_method, (x, y))’, but definitions typically are for ‘<subtype>_some_method’. Thus, when you are trying to find calls, you need to grep for ‘some_method’, but this will also catch calls and definitions of that method for instances of other subtypes of ‘<Type>’, and there may be a rather large number of them.
Here is a checklist of things to do when creating a new lisp object type named foo:
syms_of_foo
, etc. to ‘foo.c’
syms_of_foo
, etc. to ‘symsinit.h’
syms_of_foo
, etc. to ‘emacs.c’
CHECK_FOO
and
FOOP
to ‘foo.h’
enum lrecord_type
DEFINE_*_LISP_OBJECT()
to ‘foo.c’
INIT_LISP_OBJECT
call to syms_of_foo.c
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Lisp primitives are Lisp functions implemented in C. The details of interfacing the C function so that Lisp can call it are handled by a few C macros. The only way to really understand how to write new C code is to read the source, but we can explain some things here.
An example of a special operator is the definition of prog1
, from
‘eval.c’. (An ordinary function would have the same general
appearance.)
DEFUN ("prog1", Fprog1, 1, UNEVALLED, 0, /* Similar to `progn', but the value of the first form is returned. \(prog1 FIRST BODY...): All the arguments are evaluated sequentially. The value of FIRST is saved during evaluation of the remaining args, whose values are discarded. */ (args)) { /* This function can GC */ REGISTER Lisp_Object val, form, tail; struct gcpro gcpro1; val = Feval (XCAR (args)); GCPRO1 (val); LIST_LOOP_3 (form, XCDR (args), tail) Feval (form); UNGCPRO; return val; } |
Let’s start with a precise explanation of the arguments to the
DEFUN
macro. Here is a template for them:
DEFUN (lname, fname, min_args, max_args, interactive, /* docstring */ (arglist)) |
This string is the name of the Lisp symbol to define as the function
name; in the example above, it is "prog1"
.
This is the C function name for this function. This is the name that is
used in C code for calling the function. The name is, by convention,
‘F’ prepended to the Lisp name, with all dashes (‘-’) in the
Lisp name changed to underscores. Thus, to call this function from C
code, call Fprog1
. Remember that the arguments are of type
Lisp_Object
; various macros and functions for creating values of
type Lisp_Object
are declared in the file ‘lisp.h’.
Primitives whose names are special characters (e.g. +
or
<
) are named by spelling out, in some fashion, the special
character: e.g. Fplus()
or Flss()
. Primitives whose names
begin with normal alphanumeric characters but also contain special
characters are spelled out in some creative way, e.g. let*
becomes FletX()
.
Each function also has an associated structure that holds the data for
the subr object that represents the function in Lisp. This structure
conveys the Lisp symbol name to the initialization routine that will
create the symbol and store the subr object as its definition. The C
variable name of this structure is always ‘S’ prepended to the
fname. You hardly ever need to be aware of the existence of this
structure, since DEFUN
plus DEFSUBR
takes care of all the
details.
This is the minimum number of arguments that the function requires. The
function prog1
allows a minimum of one argument.
This is the maximum number of arguments that the function accepts, if
there is a fixed maximum. Alternatively, it can be UNEVALLED
,
indicating a special operator that receives unevaluated arguments, or
MANY
, indicating an unlimited number of evaluated arguments (the
C equivalent of &rest
). Both UNEVALLED
and MANY
are macros. If max_args is a number, it may not be less than
min_args and it may not be greater than 8. (If you need to add a
function with more than 8 arguments, use the MANY
form. Resist
the urge to edit the definition of DEFUN
in ‘lisp.h’. If
you do it anyways, make sure to also add another clause to the switch
statement in primitive_funcall().
)
This is an interactive specification, a string such as might be used as
the argument of interactive
in a Lisp function. In the case of
prog1
, it is 0 (a null pointer), indicating that prog1
cannot be called interactively. A value of ""
indicates a
function that should receive no arguments when called interactively.
This is the documentation string. It is written just like a documentation string for a function defined in Lisp; in particular, the first line should be a single sentence. Note how the documentation string is enclosed in a comment, none of the documentation is placed on the same lines as the comment-start and comment-end characters, and the comment-start characters are on the same line as the interactive specification. ‘make-docfile’, which scans the C files for documentation strings, is very particular about what it looks for, and will not properly extract the doc string if it’s not in this exact format.
In order to make both ‘etags’ and ‘make-docfile’ happy, make
sure that the DEFUN
line contains the lname and
fname, and that the comment-start characters for the doc string
are on the same line as the interactive specification, and put a newline
directly after them (and before the comment-end characters).
This is the comma-separated list of arguments to the C function. For a
function with a fixed maximum number of arguments, provide a C argument
for each Lisp argument. In this case, unlike regular C functions, the
types of the arguments are not declared; they are simply always of type
Lisp_Object
.
The names of the C arguments will be used as the names of the arguments
to the Lisp primitive as displayed in its documentation, modulo the same
concerns described above for F...
names (in particular,
underscores in the C arguments become dashes in the Lisp arguments).
There is one additional kludge: A trailing ‘_’ on the C argument is
discarded when forming the Lisp argument. This allows C language
reserved words (like default
) or global symbols (like
dirname
) to be used as argument names without compiler warnings
or errors.
A Lisp function with max_args = UNEVALLED
is a
special operator; its arguments are not evaluated. Instead it
receives one argument of type Lisp_Object
, a (Lisp) list of the
unevaluated arguments, conventionally named (args)
.
When a Lisp function has no upper limit on the number of arguments,
specify max_args = MANY
. In this case its implementation in
C actually receives exactly two arguments: the number of Lisp arguments
(an int
) and the address of a block containing their values (a
Lisp_Object *
). In this case only are the C types specified
in the arglist: (int nargs, Lisp_Object *args)
.
Within the function Fprog1
itself, note the use of the macros
GCPRO1
and UNGCPRO
. GCPRO1
is used to “protect”
a variable from garbage collection—to inform the garbage collector
that it must look in that variable and regard the object pointed at by
its contents as an accessible object. This is necessary whenever you
call Feval
or anything that can directly or indirectly call
Feval
(this includes the QUIT
macro!). At such a time,
any Lisp object that you intend to refer to again must be protected
somehow. UNGCPRO
cancels the protection of the variables that
are protected in the current function. It is necessary to do this
explicitly.
The macro GCPRO1
protects just one local variable. If you want
to protect two, use GCPRO2
instead; repeating GCPRO1
will
not work. Macros GCPRO3
and GCPRO4
also exist.
These macros implicitly use local variables such as gcpro1
; you
must declare these explicitly, with type struct gcpro
. Thus, if
you use GCPRO2
, you must declare gcpro1
and gcpro2
.
Note also that the general rule is caller-protects; i.e. you are only responsible for protecting those Lisp objects that you create. Any objects passed to you as arguments should have been protected by whoever created them, so you don’t in general have to protect them.
In particular, the arguments to any Lisp primitive are always
automatically GCPRO
ed, when called “normally” from Lisp code or
bytecode. So only a few Lisp primitives that are called frequently from
C code, such as Fprogn
protect their arguments as a service to
their caller. You don’t need to protect your arguments when writing a
new DEFUN
.
GCPRO
ing is perhaps the trickiest and most error-prone part of
XEmacs coding. It is extremely important that you get this
right and use a great deal of discipline when writing this code.
See section GCPRO
ing, for full details on how to do this.
What DEFUN
actually does is declare a global structure of type
Lisp_Subr
whose name begins with capital ‘SF’ and which
contains information about the primitive (e.g. a pointer to the
function, its minimum and maximum allowed arguments, a string describing
its Lisp name); DEFUN
then begins a normal C function declaration
using the F...
name. The Lisp subr object that is the function
definition of a primitive (i.e. the object in the function slot of the
symbol that names the primitive) actually points to this ‘SF’
structure; when Feval
encounters a subr, it looks in the
structure to find out how to call the C function.
Defining the C function is not enough to make a Lisp primitive available; you must also create the Lisp symbol for the primitive (the symbol is interned; see section Obarrays) and store a suitable subr object in its function cell. (If you don’t do this, the primitive won’t be seen by Lisp code.) The code looks like this:
DEFSUBR (fname); |
Here fname is the same name you used as the second argument to
DEFUN
.
This call to DEFSUBR
should go in the syms_of_*()
function
at the end of the module. If no such function exists, create it and
make sure to also declare it in ‘symsinit.h’ and call it from the
appropriate spot in main()
. See section Writing New Modules.
Note that C code cannot call functions by name unless they are defined
in C. The way to call a function written in Lisp from C is to use
Ffuncall
, which embodies the Lisp function funcall
. Since
the Lisp function funcall
accepts an unlimited number of
arguments, in C it takes two: the number of Lisp-level arguments, and a
one-dimensional array containing their values. The first Lisp-level
argument is the Lisp function to call, and the rest are the arguments to
pass to it. Since Ffuncall
can call the evaluator, you must
protect pointers from garbage collection around the call to
Ffuncall
. (However, Ffuncall
explicitly protects all of
its parameters, so you don’t have to protect any pointers passed as
parameters to it.)
The C functions call0
, call1
, call2
, and so on,
provide handy ways to call a Lisp function conveniently with a fixed
number of arguments. They work by calling Ffuncall
.
‘eval.c’ is a very good file to look through for examples; ‘lisp.h’ contains the definitions for important macros and functions.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Comments are a lifeline for programmers trying to understand tricky code. In general, the less obvious it is what you are doing, the more you need a comment, and the more detailed it needs to be. You should always be on guard when you’re writing code for stuff that’s tricky, and should constantly be putting yourself in someone else’s shoes and asking if that person could figure out without much difficulty what’s going on. (Assume they are a competent programmer who understands the essentials of how the XEmacs code is structured but doesn’t know much about the module you’re working on or any algorithms you’re using.) If you’re not sure whether they would be able to, add a comment. Always err on the side of more comments, rather than less.
Generally, when making comments, there is no need to attribute them with your name or initials. This especially goes for small, easy-to-understand, non-opinionated ones. Also, comments indicating where, when, and by whom a file was changed are strongly discouraged, and in general will be removed as they are discovered. This is exactly what ‘ChangeLogs’ are there for. However, it can occasionally be useful to mark exactly where (but not when or by whom) changes are made, particularly when making small changes to a file imported from elsewhere. These marks help when later on a newer version of the file is imported and the changes need to be merged. (If everything were always kept in CVS, there would be no need for this. But in practice, this often doesn’t happen, or the CVS repository is later on lost or unavailable to the person doing the update.)
When putting in an explicit opinion in a comment, you should always attribute it with your name and the date. This also goes for long, complex comments explaining in detail the workings of something – by putting your name there, you make it possible for someone who has questions about how that thing works to determine who wrote the comment so they can write to them. Use your actual name or your alias at xemacs.org, and not your initials or nickname, unless that is generally recognized (e.g. ‘jwz’). Even then, please consider requesting a virtual user at xemacs.org (forwarding address; we can’t provide an actual mailbox). Otherwise, give first and last name. If you’re not a regular contributor, you might consider putting your email address in – it may be in the ChangeLog, but after awhile ChangeLogs have a tendency of disappearing or getting muddled. (E.g. your comment may get copied somewhere else or even into another program, and tracking down the proper ChangeLog may be very difficult.)
If you come across an opinion that is not or is no longer valid, or you come across any comment that no longer applies but you want to keep it around, enclose it in ‘[[ ’ and ‘ ]]’ marks and add a comment afterwards explaining why the preceding comment is no longer valid. Put your name on this comment, as explained above.
Just as comments are a lifeline to programmers, incorrect comments are death. If you come across an incorrect comment, immediately correct it or flag it as incorrect, as described in the previous paragraph. Whenever you work on a section of code, always make sure to update any comments to be correct – or, at the very least, flag them as incorrect.
To indicate a “todo” or other problem, use four pound signs – i.e. ‘####’.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Global variables whose names begin with ‘Q’ are constants whose
value is a symbol of a particular name. The name of the variable should
be derived from the name of the symbol using the same rules as for Lisp
primitives. These variables are initialized using a call to
defsymbol()
in the syms_of_*()
function. (This call
interns a symbol, sets the C variable to the resulting Lisp object, and
calls staticpro()
on the C variable to tell the
garbage-collection mechanism about this variable. What
staticpro()
does is add a pointer to the variable to a large
global array; when garbage-collection happens, all pointers listed in
the array are used as starting points for marking Lisp objects. This is
important because it’s quite possible that the only current reference to
the object is the C variable. In the case of symbols, the
staticpro()
doesn’t matter all that much because the symbol is
contained in obarray
, which is itself staticpro()
ed.
However, it’s possible that a naughty user could do something like
uninterning the symbol out of obarray
or even setting
obarray
to a different value [although this is likely to make
XEmacs crash!].)
Please note: It is potentially deadly if you declare a
‘Q...’ variable in two different modules. The two calls to
defsymbol()
are no problem, but some linkers will complain about
multiply-defined symbols. The most insidious aspect of this is that
often the link will succeed anyway, but then the resulting executable
will sometimes crash in obscure ways during certain operations!
To avoid this problem, declare any symbols with common names (such as
text
) that are not obviously associated with this particular
module in the file ‘general-slots.h’. The “-slots” suffix
indicates that this is a file that is included multiple times in
‘general.c’. Redefinition of preprocessor macros allows the
effects to be different in each context, so this is actually more
convenient and less error-prone than doing it in your module.
Global variables whose names begin with ‘V’ are variables that
contain Lisp objects. The convention here is that all global variables
of type Lisp_Object
begin with ‘V’, and all others don’t
(including fixnum and boolean variables that have Lisp
equivalents). Most of the time, these variables have equivalents in
Lisp, but some don’t. Those that do are declared this way by a call to
DEFVAR_LISP()
in the vars_of_*()
initializer for the
module. What this does is create a special symbol-value-forward
Lisp object that contains a pointer to the C variable, intern a symbol
whose name is as specified in the call to DEFVAR_LISP()
, and set
its value to the symbol-value-forward Lisp object; it also calls
staticpro()
on the C variable to tell the garbage-collection
mechanism about the variable. When eval
(or actually
symbol-value
) encounters this special object in the process of
retrieving a variable’s value, it follows the indirection to the C
variable and gets its value. setq
does similar things so that
the C variable gets changed.
Whether or not you DEFVAR_LISP()
a variable, you need to
initialize it in the vars_of_*()
function; otherwise it will end
up as all zeroes, which is the integer 0 (not nil
), and
this is probably not what you want. Also, if the variable is not
DEFVAR_LISP()
ed, you must call staticpro()
on the
C variable in the vars_of_*()
function. Otherwise, the
garbage-collection mechanism won’t know that the object in this variable
is in use, and will happily collect it and reuse its storage for another
Lisp object, and you will be the one who’s unhappy when you can’t figure
out how your variable got overwritten.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Heavily used small code fragments need to be fast. The traditional way to implement such code fragments in C is with macros. But macros in C are known to be broken.
Macro arguments that are repeatedly evaluated may suffer from repeated side effects or suboptimal performance.
Variable names used in macros may collide with caller’s variables, causing (at least) unwanted compiler warnings.
In order to solve these problems, and maintain statement semantics,
one should use the do { ... } while (0)
trick (which safely
works inside of if statements) while trying to reference macro
arguments exactly once using local variables.
Let’s take a look at this poor macro definition:
#define MARK_OBJECT(obj) \ if (!marked_p (obj)) mark_object (obj), did_mark = 1 |
This macro evaluates its argument twice, and also fails if used like this:
if (flag) MARK_OBJECT (obj); else |
A much better definition is
#define MARK_OBJECT(obj) do { \ Lisp_Object mo_obj = (obj); \ if (!marked_p (mo_obj)) \ { \ mark_object (mo_obj); \ did_mark = 1; \ } \ } while (0) |
Notice the elimination of double evaluation by using the local variable with the obscure name. Writing safe and efficient macros requires great care. The one problem with macros that cannot be portably worked around is, since a C block has no value, a macro used as an expression rather than a statement cannot use the techniques just described to avoid multiple evaluation.
In most cases where a macro has function semantics, an inline function
is a better implementation technique. Modern compiler optimizers tend
to inline functions even if they have no inline
keyword, and
configure magic ensures that the inline
keyword can be safely
used as an additional compiler hint. Inline functions used in a single
.c files are easy. The function must already be defined to be
static
. Just add another inline
keyword to the
definition.
inline static int heavily_used_small_function (int arg) { ... } |
Inline functions in header files are trickier, because we would like to make the following optimization if the function is not inlined (for example, because we’re compiling for debugging). We would like the function to be defined externally exactly once, and each calling translation unit would create an external reference to the function, instead of including a definition of the inline function in the object code of every translation unit that uses it. This optimization is currently only available for gcc. But you don’t have to worry about the trickiness; just define your inline functions in header files using this pattern:
DECLARE_INLINE_HEADER ( int i_used_to_be_a_crufty_macro_but_look_at_me_now (int arg) ) { ... } |
We use DECLARE_INLINE_HEADER
rather than just the modifier
INLINE_HEADER
to prevent warnings when compiling with gcc
-Wmissing-declarations
. I consider issuing this warning for inline
functions a gcc bug, but the gcc maintainers disagree.
Every header which contains inline functions, either directly by using
DECLARE_INLINE_HEADER
or indirectly by using
DECLARE_LISP_OBJECT
must be added to ‘inline.c’’s includes
to make the optimization described above work. (Optimization note: if
all INLINE_HEADER functions are in fact inlined in all translation
units, then the linker can just discard inline.o
, since it
contains only unreferenced code).
The three golden rules of macros:
NOTE: The functions and macros below are given full prototypes in their docs, even when the implementation is a macro. In such cases, passing an argument of a type other than expected will produce undefined results. Also, given that macros can do things functions can’t (in particular, directly modify arguments as if they were passed by reference), the declaration syntax has been extended to include the call-by-reference syntax from C++, where an & after a type indicates that the argument is an lvalue and is passed by reference, i.e. the function can modify its value. (This is equivalent in C to passing a pointer to the argument, but without the need to explicitly worry about pointers.)
When to capitalize macros:
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Avoid using unsigned int
and unsigned long
whenever
possible. Unsigned types are viral – any arithmetic or comparisons
involving mixed signed and unsigned types are automatically converted to
unsigned, which is almost certainly not what you want. Many subtle and
hard-to-find bugs are created by careless use of unsigned types. In
general, you should almost never use an unsigned type to hold a
regular quantity of any sort. The only exceptions are
Other reasonable uses of unsigned int
and unsigned long
are representing non-quantities – e.g. bit-oriented flags and such.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Sometimes major textual changes are made to the source. This means that a search-and-replace is done to change type names and such. Some people disagree with such changes, and certainly if done without good reason will just lead to headaches. But it’s important to keep the code clean and understandable, and consistent naming goes a long way towards this.
An example of the right way to do this was the so-called “great integral type renaming”.
11.9.1 Great Integral Type Renaming | ||
11.9.2 Text/Char Type Renaming |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The purpose of this is to rationalize the names used for various integral types, so that they match their intended uses and follow consist conventions, and eliminate types that were not semantically different from each other.
The conventions are:
int
, and (as
far as I can tell) of size_t (unsigned!) and ssize_t. The only type
below that is not an EMACS_INT is Hashcode, which is an unsigned value
of the same size as EMACS_INT.
For the actual name changes, see the script below.
I ran the following script to do the conversion. (NOTE: This script is idempotent. You can safely run it multiple times and it will not screw up previous results – in fact, it will do nothing if nothing has changed. Thus, it can be run repeatedly as necessary to handle patches coming in from old workspaces, or old branches.) There are two tags, just before and just after the change: ‘pre-integral-type-rename’ and ‘post-integral-type-rename’. When merging code from the main trunk into a branch, the best thing to do is first merge up to ‘pre-integral-type-rename’, then apply the script and associated changes, then merge from ‘post-integral-type-change’ to the present. (Alternatively, just do the merging in one operation; but you may then have a lot of conflicts needing to be resolved by hand.)
Script ‘fixtypes.sh’ follows:
----------------------------------- cut ------------------------------------ files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" gr Memory_Count Bytecount $files gr Lstream_Data_Count Bytecount $files gr Element_Count Elemcount $files gr Hash_Code Hashcode $files gr extcount bytecount $files gr bufpos charbpos $files gr bytind bytebpos $files gr memind membpos $files gr bufbyte intbyte $files gr Extcount Bytecount $files gr Bufpos Charbpos $files gr Bytind Bytebpos $files gr Memind Membpos $files gr Bufbyte Intbyte $files gr EXTCOUNT BYTECOUNT $files gr BUFPOS CHARBPOS $files gr BYTIND BYTEBPOS $files gr MEMIND MEMBPOS $files gr BUFBYTE INTBYTE $files gr MEMORY_COUNT BYTECOUNT $files gr LSTREAM_DATA_COUNT BYTECOUNT $files gr ELEMENT_COUNT ELEMCOUNT $files gr HASH_CODE HASHCODE $files ----------------------------------- cut ------------------------------------ |
The ‘gr’ script, and the scripts it uses, are documented in ‘README.global-renaming’, because if placed in this file they would need to have their @ characters doubled, meaning you couldn’t easily cut and paste from the source.
In addition to those programs, I needed to fix up a few other things, particularly relating to the duplicate definitions of types, now that some types merged with others. Specifically:
--------------------------------- snip ------------------------------------- /* Counts of bytes or chars */ typedef EMACS_INT Bytecount; typedef EMACS_INT Charcount; /* Counts of elements */ typedef EMACS_INT Elemcount; /* Hash codes */ typedef unsigned long Hashcode; /* ------------------------ dynamic arrays ------------------- */ --------------------------------- snip ------------------------------------- |
--------------------------------- snip ------------------------------------- #endif /* The have been some arguments over the what the type should be that specifies a count of bytes in a data block to be written out or read in, using |
switch()
statements,
where XD_BYTECOUNT appears twice as a case tag. In each case, the two
case blocks contain identical code, and you should *REMOVE THE SECOND*
and leave the first.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The purpose of this was
Itext == text in internal format Ibyte == a byte in such text Ichar == a char as represented in internal character format |
Thus e.g.
set_charptr_emchar -> set_itext_ichar |
This was done using a script like this:
files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]" gr Intbyte Ibyte $files gr INTBYTE IBYTE $files gr intbyte ibyte $files gr EMCHAR ICHAR $files gr emchar ichar $files gr Emchar Ichar $files gr INC_CHARPTR INC_IBYTEPTR $files gr DEC_CHARPTR DEC_IBYTEPTR $files gr VALIDATE_CHARPTR VALIDATE_IBYTEPTR $files gr valid_charptr valid_ibyteptr $files gr CHARPTR ITEXT $files gr charptr itext $files gr Charptr Itext $files |
See above for the source to ‘gr’.
As in the integral-types change, there are pre and post tags before and after the change:
pre-internal-format-textual-renaming post-internal-format-textual-renaming |
When merging a large branch, follow the same sort of procedure documented above, using these tags – essentially sync up to the pre tag, then apply the script yourself, then sync from the post tag to the present. You can probably do the same if you don’t have a separate workspace, but do have lots of outstanding changes and you’d rather not just merge all the textual changes directly. Use something like this:
(WARNING: I’m not a CVS guru; before trying this, or any large operation that might potentially mess things up, DEFINITELY make a backup of your existing workspace.)
cup -r pre-internal-format-textual-renaming <apply script> cup -A -j post-internal-format-textual-renaming -j HEAD |
This might also work:
cup -j pre-internal-format-textual-renaming <apply script> cup -j post-internal-format-textual-renaming -j HEAD |
ben
The following is a script to go in the opposite direction:
files="*.[ch] s/*.h m/*.h config.h.in ../configure.in Makefile.in.in ../lib-src/*.[ch] ../lwlib/*.[ch]"
# Evidently Perl considers _ to be a word char ala \b, even though XEmacs
# doesn't. We need to be careful here with ibyte/ichar because of words
# like Richard, |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
To make a purified XEmacs, do: make puremacs
.
To make a quantified XEmacs, do: make quantmacs
.
You simply can’t dump Quantified and Purified images (unless using the portable dumper). Purify gets confused when xemacs frees memory in one process that was allocated in a different process on a different machine! Run it like so:
temacs -batch -l loadup.el run-temacs xemacs-args... |
To make an XEmacs that can tell valgrind to do a memory leak check at
runtime, configure --with-valgrind
. If XEmacs has been
configured --with-newgc
, then valgrind must be invoked with
--vex-iropt-precise-memory-exns=yes
in order to handle signals
properly.
Before you go through the trouble, are you compiling with all debugging and error-checking off? If not, try that first. Be warned that while Quantify is directly responsible for quite a few optimizations which have been made to XEmacs, doing a run which generates results which can be acted upon is not necessarily a trivial task.
Also, if you’re still willing to do some runs make sure you configure
with the ‘--quantify’ flag. That will keep Quantify from starting
to record data until after the loadup is completed and will shut off
recording right before it shuts down (which generates enough bogus data
to throw most results off). It also enables three additional elisp
commands: quantify-start-recording-data
,
quantify-stop-recording-data
and quantify-clear-data
.
If you want to make XEmacs faster, target your favorite slow benchmark,
run a profiler like Quantify, gprof
, or tcov
, and figure
out where the cycles are going. In many cases you can localize the
problem (because a particular new feature or even a single patch
elicited it). Don’t hesitate to use brute force techniques like a
global counter incremented at strategic places, especially in
combination with other performance indications (e.g., degree of
buffer fragmentation into extents).
Specific projects:
newline-and-indent
. Syntax
highlighting needs to be rewritten to use a reliable, fast parser, then
to trust the pre-parsed structure, and only do re-highlighting locally
to a text change. Modern machines are fast enough to implement such
parsers in Lisp; but no machine will ever be fast enough to deal with
quadratic (or worse) algorithms!
Unfortunately, Emacs Lisp is slow, and is going to stay slow. Function calls in elisp are especially expensive. Iterating over a long list is going to be 30 times faster implemented in C than in Elisp.
To get started debugging XEmacs, take a look at the ‘.gdbinit’ and ‘.dbxrc’ files in the ‘src’ directory. See the section in the XEmacs FAQ on How to Debug an XEmacs problem with a debugger.
After making source code changes, run make check
to ensure that
you haven’t introduced any regressions. If you want to make xemacs more
reliable, please improve the test suite in ‘tests/automated’.
Did you make sure you didn’t introduce any new compiler warnings?
Before submitting a patch, please try compiling at least once with
configure --with-mule --use-union-type --error-checking=all |
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Aidan Kehoe on December 27, 2016 using texi2html 1.82.