|
Searching XEmacs
Quick Links
About XEmacs
Getting XEmacs
Customizing XEmacs
Troubleshooting XEmacs
Developing XEmacs
|
|
|
Making Elisp Function Calls Faster
Owner: Martin
Effort: ???
Dependencies: ???
Abstract: This page describes many optimizations that can be
made to the existing Elisp function call mechanism without too much
effort. The most important optimizations can probably be implemented
with only a day or two of work. I think it's important to do this
work regardless of whether we eventually decide to replace the Lisp
engine.
Many complaints have been made about the speed of Elisp, and in
particular about the slowness in executing function calls, and rightly
so. If you look at the implementation of the funcall
function, you'll notice that it does an incredible amount of work.
Now logically, it doesn't need to be so. Let's look first from the
theoretical standpoint at what absolutely needs to be done to call a
Lisp function.
First, let's look at the situation that would exist if we were
smart enough to have made lexical scoping be the default language
policy. We know at compile time exactly which code can reference the
variables that are the formal parameters for the function being called
(specifically, only the code that is part of that function's
definition) and where these references are. As a result, we can
simply push all the values of the variables onto a stack, and convert
all the variable references in the function definition into stack
references. Therefore, binding lexically-scoped parameters in
preparation for a function call involves nothing more than pushing the
values of the parameters onto a stack and then setting a new value for
the frame pointer, at the same time remembering the old one. Because
the byte-code interpreter has a stack-based architecture, however, the
parameter values have already been pushed onto the stack at the time
of the function call invocation. Therefore, binding the variables
involves doing nothing at all, other than dealing with the frame
pointer.
With dynamic scoping, the situation is somewhat more complicated.
Because the parameters can be referenced anywhere, and these
references cannot be located at compile time, their values have to be
stored into a global table that maps the name of the parameter to its
current value. In Elisp, this table is called the obarray.
Variable binding in Elisp is done using the C function
specbind() . (This stands for "special variable binding"
where special is the standard Lisp terminology for a
dynamically-scoped variable.) What specbind() does,
essentially, is retrieve the old value of the variable out of the
obarray, remember the value by pushing it, along with the name of the
variable, onto what's called the specpdl stack, and then
store the new value into the obarray. The term "specpdl" means
Special Variable Pushdown List, where Pushdown List
is an archaic computer science term for a stack that used to be
popular at MIT. These binding operations, however, should still not
take very much time because of the use of symbols, i.e. because the
location in the obarray where the variable's value is stored has
already been determined (specifically, it was determined at the time
that the byte code was loaded and the symbol created), so no expensive
hash table lookups need to be performed.
An actual function invocation in Elisp does a great deal more work,
however, than was just outlined above. Let's just take a look at what
happens when one byte-compiled function invokes another byte-compiled
function, checking for places where unnecessary work is being done and
determining how to optimize these places.
The byte-compiled function's parameter list is stored in exactly
the format that the programmer entered it in, which is to say as a
Lisp list, complete with &optional and
&rest keywords. This list has to be parsed for
every function invocation, which means that for every element
in a list, the element is checked to see whether it's the
&optional or &rest keywords, its
surrounding cons cell is checked to make sure that it is indeed a cons
cell, the QUIT macro is called, etc. What should be
happening here is that the argument list is parsed exactly once, at
the time that the byte code is loaded, and converted into a C array.
The C array should be stored as part of the byte-code object. The C
array should also contain, in addition to the symbols themselves, the
number of required and optional arguments. At function call time, the
C array can be very quickly retrieved and processed.
For every variable that is to be bound, the
specbind() function is called. This actually does quite
a lot of things, including:
Checking the symbol argument to the function to make sure it's
actually a symbol.
Checking for specpdl stack overflow, and increasing its size as
necessary.
Calling symbol_value_buffer_local_info() to retrieve
buffer local information for the symbol, and then processing the
return value from this function in a series of if statements.
Actually storing the old value onto the specpdl stack.
Calling Fset() to change the variable's value.
The entire series of calls to specbind() should be
inline and merged into the argument processing code as a single tight
loop, with no function calls in the vast majority of cases. The
specbind() logic should be streamlined as follows:
The symbol argument type checking is unnecessary.
The check for the specpdl stack overflow needs to be done only
once, not once per argument.
All of the remaining logic should be boiled down as follows:
Retrieve the old value from the symbol's value cell.
If this value is a symbol-value-magic object, then call the
real specbind() to do the work.
Otherwise, we know that nothing complicated needs to be done, so
we simply push the symbol and its value onto the specpdl stack, and then
replace the value in the symbol's value cell.
The only logic that we are omitting is the code in
Fset() that checks to make sure a constant isn't being
set. These checks should be made at the time that the byte code for
the function is loaded and the C array of parameters to the function
is created. (Whether a symbol is constant or not is generally known
at XEmacs compile time. The only issue here is with symbols whose
names begin with a colon. These symbols should simply be disallowed
completely as parameter names.)
Other optimizations that could be done are:
At the beginning of the function that implements the byte-code
interpreter (this is the Lisp primitive byte-code ), the
string containing the actual byte code is converted into an array of
integers. I added this code specifically for MULE so that the
byte-code engine didn't have to deal with the complexities of the
internal string format for text. This conversion, however, is
generally useful because on modern processors accessing 32-bit values
out of an array is significantly faster than accessing unaligned 8-bit
values. This conversion takes time, though, and should be done once
at load time rather than each time the byte code is executed. This
array should be stored in the byte-code object. Currently, this is a
bit tricky to do, because byte-code is not actually
passed the byte-code object, but rather three of its elements. We
can't just change byte-code so that it is directly passed
the byte-code object because this function, with its existing argument
calling pattern, is called directly from compiled Elisp files. What
we can and should do, however, is create a subfunction that does take
a byte-code object and actually implements the byte-code interpreter
engine. Whenever the C code wants to execute byte code, it calls this
subfunction. byte-code itself also calls this
subfunction after conjuring up an appropriate byte-code object and
storing its arguments into this object. With a small amount of work,
it's possible to do this conjuring in such a way that it doesn't
generate any garbage.
At the end of a function call, the parameter bindings that have
been done need to be undone. This is standardly done by calling
unbind_to() . Just as for a specbind() , this
function does a lot of work that is unnecessary in the vast majority
of cases, and it could also be inlined and streamlined.
As part of each Elisp function call, a whole bunch of checks
are done for a series of unlikely but possible conditions that may
occur. These include, for example,
Calling the QUIT macro, which essentially involves
checking a global volatile variable to see whether additional processing
needs to be done.
Checking whether a garbage collection needs to be done.
Checking the variable debug_on_next_call .
Checking for whether Elisp profiling is active. (An additional
optimization that's perhaps not worth the effort is to do some
post-processing on the array of integers after it has been converted.
For example, whenever a 16-bit value occurs in the byte code, it has
to be encoded as two separate 8-bit values. These values could be
combined. The tricky part here is that all of the places where a goto
occurs across the place where this modification is made would have to
have their offsets changed. Other such optimizations can easily be
imagined as well.)
With a little bit smarter code, it should be possible to make a
single trip variable that indicates whether any of these conditions is
true. This variable would be updated by any code that changes the
actual variables whose values are checked in the various checks just
mentioned. (By the way, all of this is occurring in the C function
funcall_recording_as() .) There is a little bit of code
between each of the checks. This code would simply have to be
duplicated between the two cases where this general trip variable is
true and is false. (Note: the optimization detailed in this item is
probably not worth doing on the first pass.)
Ben Wing
|