[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document is based on Jamie Zawinski’s xemacs wishlist. Throughout this page, “I” refers to Jamie.
The list has been substantially reformatted and edited to fit the needs of this site. If you have any soul at all, you’ll go check out the original. OK? You should also check out some other wishlists.
I’ve ranked these (roughly) from easiest to hardest; though of all of them, I think the debugger improvements would be the most useful. I think the combination of emacs+gdb is the best Unix development environment currently available, but it’s still lamentably primitive and extremely frustrating (much like Unix itself), especially if you know what kinds of features more modern integrated debuggers have.
Keyboard macros are one of the most useful concepts that emacs has to offer, but there’s room for improvement.
Often, I’ll define a keyboard macro, and then realize that I’ve left something out, or that there’s more that I need to do; for example, I may define a macro that does something to the current line, and then realize that I want to apply it to a lot of lines. So, I’d like this to work:
C-x ( ; start macro #1 ... ; (do stuff) C-x ) ; done with macro #1 ... ; (do stuff) C-x ( ; start macro #2 C-x e ; execute macro #1 (splice it into macro #2) C-s foo ; move forward to the next spot C-x ) ; done with macro #2 C-u 1000 C-x e ; apply the new macro |
That is, simply, one should be able to wrap new text around an existing macro. I can’t tell you how many times I’ve defined a complex macro but left out the “C-n C-a” at the end...
Yes, you can accomplish this with M-x name-last-kbd-macro, but that’s a pain. And it’s also more permanent than I’d often like.
Right now, the act of defining a macro stops if you get an error while defining it, and all of the characters you’ve already typed into the macro are gone. It needn’t be that way. I think that, when that first error occurs, the user should be given the option of taking the last command off of the macro and trying again.
The macro-reader knows where the bounds of multi-character command sequences are, and it could even keep track of the corresponding undo records; rubbing out the previous entry on the macro could also undo any changes that command had made. (This should also work if the macro spans multiple buffers, and should restore window configurations as well.)
You’d want multi-level undo for this as well, so maybe the way to go would be to add some new key sequence which was used only as the back-up-inside-a-keyboard-macro-definition command.
I’m not totally sure that this would end up being very usable; maybe it would be too hard to deal with. Which brings us to:
I only just discovered edit-kbd-macro
(C-x C-k).
It is very, very cool.
The trick it does of showing the command which will be executed is somewhat error-prone, as it can only look up things in the current map or the global map; if the macro changed buffers, it wouldn’t be displaying the right commands. (One of the things I often use macros for is operating on many files at once, by bringing up a dired buffer of those files, editing them, and then moving on to the next.)
However, if the act of recording a macro also kept track of the actual commands that had gotten executed, it could make use of that info as well.
Another way of editing a macro, other than as text in a buffer, would be to have a command which single-steps a macro: you would lean on the space bar to watch the macro execute one character (command?) at a time, and then when you reached the point you wanted to change, you could do some gesture to either: insert some keystrokes into the middle of the macro and then continue; or to replace the rest of the macro from here to the end; or something.
Another similar hack might be to convert a macro to the equivalent
lisp code, so that one could tweak it later in ways that would be too
hard to do from the keyboard (wrapping parts of it in while
loops or
something.) (M-x insert-kbd-macro isn’t really what I’m
talking about here: I mean insert the list of commands, not the list
of keystrokes.)
In the spirit of the ‘teach-extended-commands-p
’ variable,
it would be interesting if emacs would keep track of what are the
commands I use most often, perhaps grouped by proximity or mode – it
would then be more obvious which commands were most likely candidates
for placement on a toolbar, or popup menu, or just a more convenient key
binding.
Bonus points if it figures out that I type “bt\n” and “ret\ny\n” into my ‘*gdb*’ buffer about a hundred thousand times a day.
The thing that “File/Open...” pops up has excellent hack value, but as a user interface, it’s an abomination. Isn’t it time someone added a real file selection dialog already? (For the Motifly-challenged, the Athena-based file selector that GhostView uses seems adequate.)
It’s great that XEmacs has a toolbar, but it’s damn near impossible to customize it.
Currently, to define a toolbar button that has a text equivalent, one must edit a pixmap, and put the text there! That’s prohibitive. One should be able to add some kind of generic toolbar button, with a plain icon or none at all, but which has a text label, without having to use a paint program.
In my c-mode-hook
, for example, I can add a couple of new
keybindings, and delete a few others, and to do that, I don’t have to
duplicate the entire definition of the c-mode-map
. Making
mode-local additions and subtractions to the toolbars should be as
easy.
The same situation holds for the right-mouse-button popup menu; one should be able to add new commands to those menus without difficulty. One problem is that each mode which does have a popup menu implements it in a different way...
About half of the work is done to make a replacement for the
XmText
widget which offloads editing responsibility to an
external Emacs process. Someone should finish that. The benefit here
would be that then, any Motif program could be linked such that all
editing happened with a real Emacs behind it. (If you’re Athena-minded,
flavor with Text
instead of XmText
– it’s probably
easy to make it work with both.)
The part of this that is done already is the ability to run an Emacs screen on a Window object that has been created by another process (this is what the ‘ExternalClient.c’ and ‘ExternalShell.c’ stuff is.) What is left to be done is, adding the text-widget-editor aspects of this.
First, the emacs screen being displayed on that window would have to be one without a modeline, and one which behaved sensibly in the context of “I am a small multi-line text area embedded in a dialog box” as opposed to “I am a full-on text editor and lord of all that I survey.”
Second, the API that the (non-emacs-aware) user of the
XmText
widget expects would need to be implemented: give the
caller the ability to pull the edited text string back out, and so on.
The idea here being, hooking up emacs as the widget editor should be as
transparent as possible.
Some of you may have seen my ‘gdb-highlight.el’ package, that I posted to gnu.emacs.sources last month. I think it’s really cool, but there should be a lot more work in that direction. For those of you who haven’t seen it, what it does is watch text that gets inserted into the ‘*gdb*’ buffer and make very nearly everything be clickable and have a context-sensitive menu. Generally, the types that are noticed are:
Any time one of those objects is presented in the ‘*gdb*’ buffer, it is mousable. Clicking middle button on it takes some default action (edits the function, selects the stack frame, disables the breakpoint, ...) Clicking the right button pops up a menu of commands, including commands specific to the object under the mouse, and/or other objects on the same line.
So that’s all well and good, and I get far more joy out of what this code does for me than I expected, but there are still a bunch of limitations. The debugger interface needs to do much, much more.
The idea behind gdbsrc-mode
is on the side of the angels:
one should be able to focus on the source code and not on the debugger
buffer, absolutely. But the implementation is just awful.
First and foremost, it should not change “modes” (in the more
general sense). Any commands that it defines should be on keys which
are exclusively used for that purpose, not keys which are normally
self-inserting. I can’t be the only person who usually has occasion to
actually edit the sources which the debugger has chosen to
display! Switching into and out of gdbsrc-mode
is
prohibitive.
I want to be looking at my sources at all times, yet I don’t want to have to give up my source-editing gestures. I think the right way to accomplish this is to put the gdbsrc commands on the toolbar and on popup menus; or to let the user define their own keys (I could see devoting my <kp_enter> key to “step”, or something common like that.)
Also it’s extremely frustrating that one can’t turn off gdbsrc mode once it has been loaded, without exiting and restarting emacs; that alone means that I’d probably never take the time to learn how to use it, without first having taken the time to repair it...
I want to be able to double-click on a variable name to highlight it, and then drag it to the debugger window to have its value printed.
I want gestures that let me write as well as read: for example, to store value A into slot B.
Any time there is a running gdb which has breakpoints, the buffers holding the lines on which those breakpoints are set should have icons in them. These icons should be context-sensitive: I should be able to pop up a menu to enable or disable them, to delete them, to change their commands or conditions.
I should also be able to move them. It’s annoying when you have a breakpoint with a complex condition or command on it, and then you realize that you really want it to be at a different location. I want to be able to drag-and-drop the icon to its new home.
The reason for all of this is that I spend entirely too much time scrolling around in the ‘*gdb*’ buffer; with gdb-highlight, I can just click on a line in the backtrace output to go to that frame, but I find that I spend a lot of time looking for that backtrace: since it’s mixed in with all the other random output, I waste time looking around for things (and usually just give up and type “bt” again, then thrash around as the buffer scrolls, and I try to find the lower frames that I’m interested in, as they have invariably scrolled off the window already...
This would be especially handy given that gdb leaks like a sieve, and with a big program, I only get a few dozen relink-and-rerun attempts before gdb has blown my swap space.
When a program is recompiled and then reloaded into gdb, the breakpoints often end up in less-than-useful places. For example, when I edit text which occurs in a file anywhere before a breakpoint, emacs is aware that the line of the bp hasn’t changed, but just that it is in a different place relative to the top of the file. Gdb doesn’t know this, so your breakpoints end up getting set in the wrong places (usually the maximally inconvenient places, like after a loop instead of inside it). But emacs knows, so emacs should inform the debugger, and move the breakpoints back to the places they were intended to be.
(Possibly the OOBR stuff does some of this, but can’t tell, because I’ve never been able to get it to do anything but beep at me and mumble about environments. I find it pretty funny that the manual keeps explaining to me how intuitive it is, without actually giving me a clue how to launch it...)
It’d be nice to be able to create more complex dialog boxes from emacs-lisp: ones with checkboxes, radio button groups, text fields, and popup menus.
One of the things that the now-defunct Energize code (the C side of it, that is) could do was embed a dialog box between the toolbar and the main text area – buffers could have control panels associated with them, that had all kinds of complex behavior.
You know, I’ve encountered people who have been using emacs for years, and never use the mark stack for navigation. I can’t live without it; “C-u C-SPC” is among my most common gestures.
point-to-register
, C-x
/) should be displayed differently (more prominent.)
The emacs GC is very primitive; it is also, fortunately, a rather well isolated module, and it would not be a very big task to swap it with a new one (once that new one was written, that is.) Someone should go bone up on modern GC techniques, and then just dive right in...
Yadda yadda, this list goes to eleven.
Subject: Re: XEmacs wishlist Date: Wed, 14 May 1997 16:18:23 -0700 From: Jamie Zawinski <jwz@netscape.com> Newsgroups: comp.emacs.xemacs, comp.emacs
Andreas Schwab wrote:
Use ‘C-u C-x (’:
start-kbd-macro:
Non-nil arg (prefix arg) means append to last macro defined; This begins by re-executing that macro as if you typed it again.
Cool, I didn’t know it did that...
But it only lets you append. I often want to prepend, or embed the macro multiple times (motion 1, C-x e, motion 2, C-x e, motion 3.)
Author: Ben Wing
DISTRIBUTION ISSUES
A. Unified Source Tarball.
Packages go under root/lib/xemacs/xemacs-packages and no one ever has to mess with –package-path and the result can be moved from one directory to another pre- or post-install.
Unified Binary Tarballs with Packages.
Same principles as above.
If people complain, we can also provide split binary tarballs (architecture dependent and independent) and place these files in a subdirectory so as not to confuse the majority just looking for one tarball.
Under Windows, we need to provide a WISE-style GUI setup program. It’s already there but needs some work so you can select "all" packages easily (should be the default).
Parallel Root and Package Trees.
If the user downloads separately, the main source and the packages, he will naturally untar them into the same directory. This results in the parallel root and package structure. We should support this as a "last resort," i.e., if we find no packages anywhere and are about to resign ourselves to not having packages, then look for a parallel package tree. The user who sets things up like this should be able to either run in place or "make install" and get a proper installed XEmacs. Never should the user have to touch –package-path.
II. WINDOWS PRINTING
Looks like the internals are done but not the GUI. This must be working in 21.2.
III. WINDOWS MULE
Basic support should be there. There’s already a patch to get things started and I’ll be doing more work to make this real.
IV. GUTTER ETC.
This stuff needs to be "stable" and generally free from bugs. Any APIs we create need to be well-reviewed or marked clearly as experimental.
V. PORTABLE DUMPER
Last bits need to be cleaned up. This should be made the "default" for a while to flush-out problems. Under Microsoft Windows, Portable Dumper must be the default in 21.2 because of the problems with the existing dump process.
COMMENT: I’d like to feature freeze this pretty soon and create a 21.3 tree where all of my major overhauls of Mule-related stuff will go in. At the same time or around, we need to do the move-around in the repository (or create a new one) and "upgrade" to the latest CVS server.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
A while ago I created a package called Sysdep, which aimed to be a forward compatibility package for Elisp. The idea was that instead of having to write your package using the oldest version of Emacs that you wanted to support, you could use the newest XEmacs API, and then simply load the Sysdep package, which would automatically define the new API in terms of older APIs as necessary. The idea of this package was good, but its design wasn’t perfect, and it wasn’t widely adopted. I propose a new package called Compat that corrects the design flaws in Sysdep, and hopefully will be adopted by most of the major packages.
In addition, this package will provide macros that can be used to bracket code as necessary to disable byte compiler warnings generated as a result of supporting the APIs of different versions of Emacs; or rather the Compat package strives to provide useful constructs to make doing this support easier, and these constructs have the side effect of not causing spurious byte compiler warnings. The idea here is that it should be possible to create well-written, clean, and understandable Elisp that supports both older and newer APIs, and has no byte compiler warnings. Currently many warnings are unavoidable, and as a result, they are simply ignored, which also causes a lot of legitimate warnings to be ignored.
The approach taken by the Sysdep package to make sure that the newest API was always supported was fairly simple: when the Sysdep package was loaded, it checked for the existence of new API functions, and if they weren’t defined, it defined them in terms of older API functions that were defined. This had the advantage that the checks for which API functions were defined were done only once at load time rather than each time the function was called. However, the fact that the new APIs were globally defined caused a lot of problems with unwanted interactions, both with other versions of the Sysdep package provided as part of other packages, and simply with compatibility code of other sorts in packages that would determine whether an API existed by checking for the existence of certain functions within that API. In addition, the Sysdep package did not scale well because it defined all of the functions that it supported, regardless of whether or not they were used.
The Compat package remedies the first problem by ensuring that the new APIs are defined only within the lexical scope of the packages that actually make use of the Compat package. It remedies the second problem by ensuring that only definitions of functions that are actually used are loaded. This all works roughly according to the following scheme:
eval-when-compile
call within the package code itself. What the generator does is scan
all of the Lisp code in the package, determine which function calls are
made that the Compat package knows about, and generates custom
compat
code that conditionally defines just these functions when
the package is loaded. The custom compat
code can either be
written to a separate Lisp file (for use with multi-file packages), or
inserted into the beginning of the Lisp file of a single file package.
(In the latter case, the package indicates where this generated code
should go through the use of magic comments that mark the beginning and
end of the section. Some will say that doing this trick is bad juju,
but I have done this sort of thing before, and it works very well in
practice).
compat
code have their names prefixed
with both the name of the package and the word compat
, ensuring
that there will be no name space conflicts with other functions in the
same package, or with other packages that make use of the Compat
package.
compat
code
are determined at run time. When the equivalent API already exists, the
wrapper functions are simply defined directly in terms of the actual
functions, so that the only run time overhead from using the Compat
package is one additional function call. (Alternatively, even this
small overhead could be avoided by retrieving the definitions of the
actual functions and supplying them as the definitions of the wrapper
functions. However, this appears to me to not be completely safe. For
example, it might have bad interactions with the advice package).
compat
code is
bracketed by a call to the construct compat-execute
. What this
actually does is lexically bind all of the function names that are being
redefined with macro functions by using the Common Lisp macro macrolet.
(The definition of this macro is in the CL package, but in order for
things to work on all platforms, the definition of this macro will
presumably have to be copied and inserted into the custom compat
code).
In addition, the Compat package should define the macro
compat-if-fboundp
. Similar macros such as
compile-when-fboundp
and compile-case-fboundp
could be
defined using similar principles). The compat-if-fboundp
macro
behaves just like an (if (fboundp ...) ...)
clause when executed,
but in addition, when it’s compiled, it ensures that the code inside the
if-true
sub-block will not cause any byte compiler warnings about
the function in question being unbound. I think that the way to
implement this would be to make compat-if-fboundp
be a macro that
does what it’s supposed to do, but which defines its own byte code
handler, which ensures that the particular warning in question will be
suppressed. (Actually ensuring that just the warning in question is
suppressed, and not any others, might be rather tricky. It certainly
requires further thought).
Note: An alternative way of avoiding both warnings about unbound
functions and warnings about obsolete functions is to just call the
function in question by using funcall
, instead of calling the
function directly. This seems rather inelegant to me, though, and
doesn’t make it obvious why the function is being called in such a
roundabout manner. Perhaps the Compat package should also provide a
macro compat-funcall
, which works exactly like funcall
,
but which indicates to anyone reading the code why the code is expressed
in such a fashion.
If you’re wondering how to implement the part of the Compat generator
where it scans Lisp code to find function calls for functions that it
wants to do something about, I think the best way is to simply process
the code using the Lisp function read
and recursively descend any
lists looking for function names as the first element of any list
encountered. This might extract out a few more functions than are
actually called, but it is almost certainly safer than doing anything
trickier like byte compiling the code, and attempting to look for
function calls in the result. (It could also be argued that the names
of the functions should be extracted, not only from the first element of
lists, but anywhere symbol
occurs. For example, to catch places
where a function is called using funcall
or apply
.
However, such uses of functions would not be affected by the surrounding
macrolet call, and so there doesn’t appear to be any point in extracting
them).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
Abstract: I propose completely redoing the drag-n-drop interface to make it powerful and extensible enough to support such concepts as drag over and drag under visuals and context menus invoked when a drag is done with the right mouse button, to allow drop handlers to be defined for all sorts of graphical elements including buffers, extents, mode lines, toolbar items, menubar items, glyphs, etc., and to allow different packages to add and remove drop handlers for the same drop sites without interfering with each other. The changes are extensive enough that I think they can only be implemented in version 22, and the drag-n-drop interface should remain experimental until then.
The new drag-n-drop interface centers around the twin concepts of drop site and drop handler. A drop site specifies a particular graphical element where an object can be dropped onto, and a drop handler encapsulates all of the behavior that happens when such an object is dragged over and dropped onto a drop site.
Each drop site has an object associated with it which is passed to
functions that are part of the drop handlers associated with that site.
The type of this object depends on the graphical element that comprises
the drop site. The drop site object can be a buffer, an extent, a
glyph, a menu path, a toolbar item path, etc. (These last two object
types are defined in Lisp Interface Changes
in the sections on menu and toolbar API changes. If we wanted to allow
drops onto other kinds of drop sites, for example mode lines, we would
have to create corresponding path objects). Each such object type
should be able to be accessed using the generalized property interface
defined above, and should have a property called drop-handlers
associated with it that specifies all of the drop handlers associated
with the drop site. Normally, this property is not accessed directly,
but instead by using the drop handler API defined below, and Lisp
packages should not make any assumptions about the format of the data
contained in the drop-handlers
property.
Each drop handler has an object of type drop-handler
associated
with it, whose primary purpose is to be a container for the various
properties associated with a particular drop handler. These could
include, for example, a function invoked when the drop occurs, a context
menu invoked when a drop occurs as a result of a drag with the right
mouse button, functions invoked when a dragged object enters, leaves, or
moves within a drop site, the shape that the mouse pointer changes to
when an object is dragged over a drop site that allows this particular
object to be dropped onto it, the MIME types (actually a regular
expression matching the MIME types) of the allowable objects that can be
dropped onto the drop site, a package tag (a symbol specifying the
package that created the drop handler, used for identification
purposes), etc. The drop handler object is passed to the functions that
are invoked as a result of a drag or a drop, most likely indirectly as
one of the properties of the drag or drop event passed to the function.
Properties of a drop handler object are accessed and modified in the
standard fashion using the generalized property interface.
A drop handler is added to a drop site using the add-drop-handler
function. The drop handler itself can either be created separately
using the make-drop-handler
function and then passed in as one of
the parameters to add-drop-handler
, or it will be created
automatically by the add-drop-handler
function, if the drop
handler argument is omitted, but keyword arguments corresponding to the
valid keyword properties for a drop handler are specified in the
add-drop-handler
call. Other functions, such as
find-drop-handler
, add-drop-handler
(when specifying a
drop handler before which the drop handler in question is to be added),
remove-drop-handler
etc. should be defined with obvious
semantics. All of these functions take or return a drop site object
which, as mentioned above, can be one of several object types
corresponding to graphical elements. Defined drop handler functions
locate a particular drop handler using either the MIME-type
or
package-tag
property of the drop handler, as defined above.
Logically, the drop handlers associated with a particular drop site are an ordered list. The first drop handler whose specified MIME type matches the MIME type of the object being dragged or dropped controls what happens to this object. This is important particularly because the specified MIME type of the drop handler can be a regular expression that, for example, matches all audio objects with any sub-type.
In the current drag-n-drop API, there is a distinction made between objects with an associated MIME type and objects with an associated URL. I think that this distinction is arbitrary, and should not exist. All objects should have a MIME type associated with them, and a new XEmacs-specific MIME type should be defined for URLs, file names, etc. as necessary. I am not even sure that this is necessary, however, as the MIME specification may specify a general concept of a pointer or link to an object, which is exactly what we want. Also in some cases (for example, the name of a file that is locally available), the pointer or link will have another MIME type associated with it, which is the type of the object that is being pointed to. I am not quite sure how we should handle URL and file name objects being dragged, but I am positive that it needs to be integrated with the mechanism used when an object itself is being dragged or dropped.
As is described in a separate page, the
misc-user-event
event type should be removed and split up into a
number of separate event types. Two such event types would be
drag-event
and drop-event
. A drop event is used when an
object is actually dropped, and a drag event is used if a function is
invoked as part of the dragging process. (Such a function would
typically be used to control what are called drag under visuals,
which are changes to the appearance of the drop site reflecting the fact
that a compatible object is being dragged over it). The drag events and
drop events encapsulate all of the information that is pertinent to the
drag or drop action occurring, including such information as the actual
MIME type of the object in question, the drop handler that caused a
function to be invoked, the mouse event (or possibly even a keyboard
event) corresponding to the user’s action that is causing the drag or
drop, etc. This event is always passed to any function that is invoked
as a result of the drag or drop. There should never be any need to
refer to the current-mouse-event
variable, and in fact, this
variable should not be changed at all during a drag or a drop.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
Abstract: Apparently, if you know the name of a package (for
example, fusion
), you can load it using the require
function, but there’s no standard way to turn it on or turn it off. The
only way to figure out how to do that is to go read the source file,
where hopefully the comments at the start tell you the appropriate magic
incantations that you need to run in order to turn the extension on or
off. There really needs to be standard functions, such as
enable-extension
and disable-extension
, to do this sort of
thing. It seems like a glaring omission that this isn’t currently
present, and it’s really surprising to me that nobody has remarked on
this.
The easy part of this is defining the interface, and I think it should
be done as soon as possible. When the package is loaded, it simply
calls some standard function in the package system, and passes it the
names of enable and disable functions, or perhaps just one function that
takes an argument specifying whether to enable or disable. In any case,
this data is kept in a table which is used by the
enable-extension
and disable-extension
function. There
should also be functions such as extension-enabled-p
and
enabled-extension-list
, and so on with obvious semantics. The
hard part is actually getting packages to obey this standard interface,
but this is mitigated by the fact that the changes needed to support
this interface are so simple.
I have been conceiving of these enabling and disabling functions as turning the feature on or off globally. It’s probably also useful to have a standard interface returning a extension on or off in just the particular buffer. Perhaps then the appropriate interface would involve registering a single function that takes an argument that specifies various things, such as turn off globally, turn on globally, turn on or off in the current buffer, etc.
Part of this interface should specify the correct way to define global
key bindings. The correct rule for this, of course, is that the key
bindings should not happen when the package is loaded, which is often
how things are currently done, but only when the extension is actually
enabled. The key bindings should go away when the extension is
disabled. I think that in order to support this properly, we should
expand the keymap interface slightly, so that in addition to other
properties associated with each key binding is a list of shadow
bindings. Then there should be a function called
define-key-shadowing
, which is just like define-key
but
which also remembers the previous key binding in a shadow list. Then
there can be another function, something like undefine-key
, which
restores the binding to the most recently added item on the shadow list.
There are already hash tables associated with each key binding, and it
should be easy to stuff additional values, such as a shadow list, into
the hash table. Probably there should also be functions called
global-set-key-shadowing
and global-unset-key-shadowing
with obvious semantics.
Once this interface is defined, it should be easy to expand the custom package so it knows about this interface. Then it will be possible to put all sorts of extensions on the options menu so that they could be turned off and turned on very easily, and then when you save the options out to a file, the design settings for whether these extensions are enabled or not are saved out with it. A whole lot of custom junk that’s been added to a lot of different packages could be removed. After doing this, we might want to think of a way to classify extensions according to how likely we think the user will want to use them. This way we can avoid the problem of having a list of 100 extensions and the user not being able to figure out which ones might be useful. Perhaps the most useful extensions would appear immediately on the extensions menu, and the less useful ones would appear in a submenu of that, and another submenu might contain even less useful extensions. Of course the package authors might not be too happy with this, but the users probably will be. I think this at least deserves a thought, although it’s possible you might simply want to maintain a list on the web site of extensions and a judgment on first of all, how commonly a user might want this extension, and second of all, how well written and bug-free the package is. Both of these sorts of judgments could be obtained by doing user surveys if need be.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
Abstract: A proposal is outlined for converting XEmacs to use
the .xemacs
subdirectory for its initialization files instead of
putting them in the user’s home directory. In the process, a general
pre-initialization scheme is created whereby all of the initialization
parameters, such as the location of the initialization files, whether
these files are loaded or not, where the initial frame is created,
etc. that are currently specified by command line arguments, by
environment variables, and other means, can be specified in a uniform
way using Lisp code. Reasonable default behavior for everything will
still be provided, and the older, simpler means can be used if desired.
Compatibility with the current location and name of the initialization
file, and the current ill-chosen use for the .xemacs
directory is
maintained, and the problem of how to gracefully migrate a user from the
old scheme into the new scheme while still allowing the user to use GNU
Emacs or older versions of XEmacs is solved. A proposal for changing
the way that the initial frame is mapped is also outlined; this would
allow the user’s initialization file to control the way that the initial
frame appears without resorting to hacks, while still making echo area
messages visible as they appear, and allowing the user to debug errors
in the initialization file.
normal-top-level
, and the general way that the user customizes
this process should also be done using Lisp code.
.xemacs
subdirectory), the name of the user init file, the
name of the custom init file, where and what type the initial device is,
whether and when the initial frame is mapped, etc. A standard interface
is provided for getting and setting the values of these properties using
functions such as set-pre-init-property
,
pre-init-property
, etc. At various points during the
pre-initialization process, the value of many of these properties can be
undecided, which means that at the end of the process, the value of
these properties will be derived from other properties in some fashion
that is specific to each property.
-q
and -nw
.
-pre-init
, whose value is a
Lisp expression to be evaluated at pre-initialization time, similar to
the -eval
command line switch. This allows any
pre-initialization property to be set from the command line.
.xemacs
sub-directory exists, and it’s not obviously a
package root (which probably means that it contains a file like
init.el
or pre-init.el
, or if neither of those files is
present, then it doesn’t contain any sub-directories or files that look
like what would be in a package root), then it becomes the value of the
init file directory. Otherwise the user’s home directory is used.
.emacs
. Otherwise, it’s called init.el
.
.xemacs-pre-init.el
. Otherwise it’s
called pre-init.el
. (One of the reasons for this rule has to do
with the dialog box that might be displayed at startup. This will be
described below.)
.xemacs-custom-init.el
. Otherwise, it’s
called custom-init.el
.
.xemacs-pre-init.el
in the user’s home directory is created
or appended to with a line of Lisp code that sets up a pre-init property
indicating that this dialog box shouldn’t come up again. If the
Yes option is chosen, then any package root files in
.xemacs
are moved into .xemacs/packages
, the file
.emacs
is moved into .xemacs/init.el
and .emacs
in
the home directory becomes a symlink to this file. This way some
compatibility is still maintained with GNU Emacs and older versions of
XEmacs. The code that implements this has to be written very carefully
to make sure that it doesn’t accidentally delete or mess up any of the
files that get moved around.
The custom init file is where the custom package writes its options. This obviously needs to be a separate file from the standard init file. It should also be loaded before the init file rather than after, as is usually done currently, so that the init file can override these options if it wants to.
In addition to the above scheme, the way that XEmacs handles mapping the initial frame should be changed. However, this change perhaps should be delayed to a later version of XEmacs because of the user visible changes that it entails and the possible breakage in people’s init files that might occur. (For example, if the rest of the scheme is implemented in 21.2, then this part of the scheme might want to be delayed until version 22.) The basic idea is that the initial frame is not created before the initialization file is run, but instead a banner frame is created containing the XEmacs logo, a button that allows the user to cancel the execution of the init file and an area where messages that are output in the process of running this file are displayed. This area should contain a number of lines, which makes it better than the current scheme where only the last message is visible. After the init file is done, the initial frame is mapped. This way the init file can make face changes and other such modifications that affect initial frame and then have the initial frame correctly come up with these changes and not see any frame dancing or other problems that exist currently.
There should be a function that allows the initialization file to
explicitly create and map the first frame if it wants to. There should
also be a pre-init property that controls whether the banner frame
appears (of course it defaults to true) a property controlling when the
initial frame is created (before or after the init file, defaulting to
after), and a property controlling whether the initial frame is mapped
(normally true, but will be false if the -unmapped
command line
argument is given).
If an error occurs in the init file, then the initial frame should always be created and mapped at that time so that the error is displayed and the debugger has a place to be invoked.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
NOTE: These changes are partly motivated by the various user-interface changes elsewhere in this document, and partly for Mule support. In general the various APIs in this document would benefit greatly from built-in keywords.
I would like to make keyword parameters an integral part of Elisp. The
idea here is that you use the &key
identifier in the
parameter list of a function and all of the following parameters
specified are keyword parameters. This means that when these arguments
are specified in a function call, they are immediately preceded in the
argument list by a keyword, which is a symbol beginning with the
‘:’ character. This allows any argument to be specified independently
of any other argument with no need to place the arguments in any
particular order. This is particularly useful for functions that take
many optional parameters; using keyword parameters makes the code much
cleaner and easier to understand.
The cl
package already provides keyword parameters of a sort, but
I would like to make this more integrated and useable in a standard
fashion. The interface that I am proposing is essentially compatible
with the keyword interface in Common Lisp, but it may be a subset of the
Common Lisp functionality, especially in the first implementation.
There is one departure from the Common Lisp specification that I would
like to make in order to make it much easier to add keyword parameters
to existing functions with optional parameters, and in general, to make
optional and keyword parameters coexist more easily. The Common Lisp
specification indicates that if a function has both optional and keyword
parameters, the optional parameters are always processed before the
keyword parameters. This means, for example, that if a function has
three required parameters, two optional parameters, and some number of
keyword parameters following, and the program attempts to call this
function by passing in the three required arguments, and then some
keyword arguments, the first keyword specified and the argument
following it get assigned to the first and second optional parameters as
specified in the function definition. This is certainly not what is
intended, and means that if a function defines both optional and keyword
parameters, any calls of this function must specify nil
for all
of the optional arguments before using any keywords. If the function
definition is later changed to add more optional parameters, all
existing calls to this function that use any keyword arguments will
break. This problem goes away if we simply process keyword parameters
before the optional parameters.
The primary changes needed to support the keyword syntax are:
funcall
function needs to be modified
so that it knows how to process keyword parameters. This is the only
place that will require very much intricate coding, and much of the
logic that would need to be added can be lifted directly from the
cl
code.
DEFUN
macro, and probably called
DEFUN_WITH_KEYWORDS
, needs to be defined so that built-in Lisp
primitives containing keywords can be created. Now, the
DEFUN_WITH_KEYWORDS
macro should take an additional parameter
which is a string, which consists of the part of the lambda list
declaration for this primitive that begins with the &key
specifier. This string is parsed in the DEFSUBR
macro during
XEmacs initialization, and is converted into the appropriate structure
that needs to be stored into the subr object. In addition, the
max_args parameter of the DEFUN
macro needs to be
incremented by the number of keyword parameters and these parameters are
passed to the C function simply as extra parameters at the end. The
DEFSUBR
macro can sort out the actual number of required,
optional and keyword parameters that the function takes, once it has
parsed the keyword parameter string. (An alternative that might make
the declaration of a primitive a little bit easier to understand would
involve adding another parameter to the DEFUN_WITH_KEYWORDS
macro
that specifies the number of keyword parameters. However, this would
require some additional complexity in the preprocessor definition of the
DEFUN_WITH_KEYWORDS
macro, and probably isn’t worth
implementing).
make-docfile
program would have to be modified so that it
generates the correct parameter lists for primitives defined using the
DEFUN_WITH_KEYWORDS
macro.
&rest
and &key
specifiers to parse their argument lists.
DEFUN_WITH_KEYWORDS (Ffoo, "foo", 2, 5, 6, ALLOW_OTHER_KEYWORDS, (ichi, ARG_NIL), (ni, ARG_NIL), (san, ARG_UNBOUND), 0, (arg1, arg2, arg3, arg4, arg5) ) { ... } -> C fun of 12 args: (arg1, ... arg5, ichi, ..., roku, other keywords) Circled in blue is actual example declaration DEFUN_WITH_KEYWORDS (Ffoo, "foo", 1,2,0 (bar, baz) <- arg list [ MIN ARGS, MAX ARGS, something that could be REST, SPECIFY_DEFAULT or REST_SPEC] [#KEYWORDS [ ALLOW_OTHER, SPECIFY_DEFAULT, ALLOW_OTHER_SPECIFY_DEFAULT 6, ALLOW_OTHER_SPECIFY_DEFAULT, (ichi, 0) (ni, 0), (san, DEFAULT_UNBOUND), (shi, "t"), (go, "5"), (roku, "(current-buffer)") <- specifies arguments, default values (string to be read into Lisp data during init; then forms evalled at fn ref time. ,0 <- [INTERACTIVE SPEC] ) LO = Lisp_Object -> LO Ffoo (LO bar, LO baz, LO ichi, LO ni, LO san, LO shi, LO go, LO roku, int numkeywords, LO *other_keywords) #define DEFUN_WITH_KEYWORDS (fun, funstr, minargs, maxargs, argspec, \ #args, num_keywords, keywordspec, keywords, intspec) \ LO fun (DWK_ARGS (maxargs, args) \ DWK_KEYWORDS (num_keywords, keywordspec, keywords)) #define DWK_KEYWORDS (num_keywords, keywordspec, keywords) \ DWK_KEYWORDS ## keywordspec (keywords) DWK_OTHER_KEYWORDS ## keywordspec) #define DWK_KEYWORDS_ALLOW_OTHER (x,y) DWK_KEYWORDS (x,y) #define DWK_KEYWORDS_ALLOW_OTHER_SPECIFICATIONS (x,y) DWK_KEYWORDS_SPECIFY_DEFAULT (x,y) #define DWK_KEYWORDS_SPECIFY_DEFAULT (numkey, key) ARGLIST_CAR ## numkey key #define ARGLT_GRZ (x,y) LO CAR x, LO CAR y |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
In my past work on XEmacs, I already expanded the standard property
functions of get
, put
, and remprop
to work on
objects other than symbols and defined an additional function
object-plist
for this interface. I’d like to expand this
interface further and advertise it as the standard way to make property
changes in objects, especially the new objects that are going to be
defined in order to support the added user interface features of version
22. My proposed changes are as follows:
get
when it is
unbound, which is to say that its value has not been explicitly
specified. Note: the way to make a property unbound is to call
remprop
. Note also that for some built-in properties, setting
the property to its default value is equivalent to making it unbound.
get
function is modified. If the get
function is called on a property that is unbound and the third, optional
default argument is nil
, then the default value of the
property is returned. If the default argument is not nil
,
then whatever was specified as the value of this argument is returned.
For the most part, this is upwardly compatible with the existing
definition of get
because all user-defined properties have an
initial default value of nil
. Code that calls the get
function and specifies nil
for the default argument, and
expects to get nil
returned if the property is unbound, is almost
certainly wrong anyway.
get1
is defined. This function does not take a
default argument like the get
function. Instead, if the property
is unbound, an error is signaled. Note: get
can be implemented
in terms of get1
.
property-default-value
and property-bound-p
are defined with the obvious semantics.
property-built-in-p
is defined which takes
two arguments, the first one being a symbol naming an object type, and
the second one specifying a property, and indicates whether the property
name has a built-in meaning for objects of that type.
put
function should signal an error, such as
undefined-property
, when given any property other than those that
are predefined.
user-defined-properties-allowed-p
should be
defined with the obvious semantics. (See the previous item.)
built-in-property-name-list
, property-name-list
, and
user-defined-property-name-list
.
Another idea:
(define-property-method predicate object-type predicate cons :(KEYWORD) (all lists beginning with KEYWORD) :put putfun :get :remprop :object-props :clear-properties :map-properties e.g. (define-property-method 'hash-table :put #'(lambda (obj key value) (puthash key obj value))) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
43.8.1 Future Work – Easier Toolbar Customization | ||
43.8.2 Future Work – Toolbar Interface Changes |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
Abstract: One of XEmacs’ greatest strengths is its ability to be customized endlessly. Unfortunately, it is often too difficult to figure out how to do this. There has been some recent work like the Custom package, which helps in this regard, but I think there’s a lot more work that needs to be done. Here are some ideas (which certainly could use some more thought).
Although there is currently an edit-toolbar
package, it is not
well integrated with XEmacs, and in general it is much too hard to
customize the way toolbars look. I would like to see an interface that
works a bit like the way things work under Windows, where you can
right-click on a toolbar to get a menu of options that allows you to
change aspects of the toolbar. The general idea is that if you
right-click on an item itself, you can do things to that item, whereas
if you right-click on a blank part of a toolbar, you can change the
properties of the toolbar. Some of the items on the right-click menu
for a particular toolbar button should be specified by the button
itself. Others should be standard. For example, there should be an
Execute item which simply does what would happen if you
left-click on a toolbar button. There should probably be a
Delete item to get rid of the toolbar button and a
Properties item, which brings up a property sheet that allows
you to do things like change the icon and the command string that’s
associated with the toolbar button.
The options to change the appearance of the toolbar itself should
probably appear both on the context menu for specific buttons, and on
the menu that appears when you click on a blank part of the toolbar.
That way, if there isn’t a blank part of the toolbar, you can still
change the toolbar appearance. As for what appears in these items, in
Outlook Express, for example, there are three different menu items, one
of which is called Buttons, which brings up, or pops up a
window which allows you to edit the toolbar, which for us could pop up a
new frame, which is running edit-toolbar.el
. The second item is
called Align, which contains a submenu that says Top,
Bottom, Left, and Right, which will be just
like setting the default toolbar position. The third one says
Text Labels, which would just let you select whether there are
captions or not. I think all three of these are useful and are easy to
implement in XEmacs. These things also need to be integrated with
custom so that a user can control whether these options apply to all
sessions, and in such a case can save the settings out to an options
file. edit-toolbar.el
in particular needs to integrate with
custom. Currently it has some sort of hokey stuff of its own, which it
saves out to a .toolbar
file. Another useful option to have,
once we draw the captions dynamically rather than using pre-generated
ones, would be the ability to change the font size of the captions. I’m
sure that Kyle, for one, would appreciate this.
(This is incomplete.....)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
I propose changing the way that toolbars are specified to make them more flexible.
:help-echo
, :context-menu
, :drop-handlers
, and
:enabled-p
. The :enabled-p
and :help-echo
keyword
arguments are the same as the third and fourth items in the old toolbar
item vector format. The :context-menu
keyword is a list in
standard menu format that specifies additional items that will appear
when the context menu for the toolbar item is popped up. (Typically,
this happens when the right mouse button is clicked on the toolbar
item). The :drop-handlers
keyword is for use by the new
drag-n-drop interface (see Drag-n-Drop Interface Changes ), and is not normally specified or modified directly.
:captioned-p
(whether the captions are visible under the
toolbar), :glyphs-visible-p
(whether the toolbar glyphs are
visible), and :context-menu
(additional items that will appear on
the context menus for all toolbar items and additionally will appear on
the context menu that is popped up when the right mouse button is
clicked over a portion of the toolbar that does not have any toolbar
buttons in it). The current standard practice with regards to such
properties seems to be to have separate specifiers, such as
left-toolbar-width
, right-toolbar-width
,
left-toolbar-visible-p
, right-toolbar-visible-p
, etc. It
could easily be argued that there should be no such toolbar specifiers
and that all such properties should be part of the toolbar instantiator
itself. In this scheme, the only separate specifiers that would exist
for individual properties would be default values. There are a lot of
reasons why an interface change like this makes sense. For example,
currently when VM sets its toolbar, it also sets the toolbar width and
similar properties. If you change which edge of the frame the VM
toolbar occurs in, VM will also have to go and modify all of the
position-specific toolbar specifiers for all of the other properties
associated with a toolbar. It doesn’t really seem to make sense to me
for the user to be specifying the width and visibility and such of
specific toolbars that are attached to specific edges because the user
should be free to move the toolbars around and expect that all of the
toolbar properties automatically move with the toolbar. (It is also easy
to imagine, for example, that a toolbar might not be attached to the
edge of the frame at all, but might be floating somewhere on the user’s
screen). With an interface where these properties are separate
specifiers, this has to be done manually. Currently, having the various
toolbar properties be inside of toolbar instantiators makes them
difficult to modify, but this will be different with the API that I
propose below.
toolbar-path
and toolbar-item-path
, respectively) whose
properties specify the location in a toolbar instantiator where changes
to the instantiator can be made. A toolbar path, for example, would be
created using the make-toolbar-path
function, which takes a
toolbar specifier (or optionally, a symbol, such as left
,
right
, default
, or nil
, which refers to a
particular toolbar), and optionally, parameters such as the locale and
the tag set, which specify which actual instantiator inside of the
toolbar specifier is to be modified. A toolbar item path is created
similarly using a function called make-toolbar-item-path
, which
takes a toolbar specifier and a string naming the caption of the toolbar
item to be modified, as well as, of course, optionally the locale and
tag set parameters and such.
The usefulness of these path objects is as arguments to functions that
will use them as pointers to the place in a toolbar instantiator where
the modification should be made. Recall, for example, the generalized
property interface described above. If a function such as get
or
put
is called on a toolbar path or toolbar item path, it will use
the information contained in the path object to retrieve or modify a
property located at the end of the path. The toolbar path objects can
also be passed to new functions that I propose defining, such as
add-toolbar-item
, delete-toolbar-item
, and
find-toolbar-item
. These functions should be parallel to the
functions for inserting, deleting, finding, etc. items in a menu. The
toolbar item path objects can also be passed to the drop-handler
functions defined in Drag-n-Drop Interface Changes to retrieve or modify the drop handlers that are associated
with a toolbar item. (The idea here is that you can drag an object and
drop it onto a toolbar item, just as you could onto a buffer, an extent,
a menu item, or any other graphical element).
default-toolbar-context-menu
according to the
rules defined above) should contain entries allowing the user to modify
the appearance of a toolbar. Entries would include, for example,
whether the toolbar is captioned, whether the glyphs for the toolbar are
visible (if the toolbar is captioned but its glyphs are not visible, the
toolbar appears as nothing but text; you can set things up this way, for
example, in Netscape), an option that brings up a package for editing
the contents of a toolbar, an option to allow the caption face to be
dchanged (perhaps thorough jan edit-faces
or custom
interface), etc.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
default-menubar
and should replace
the existing current-menubar
variable. This would increase the
power of the menubar interface and bring it in line with the toolbar
interface. (In order to provide proper backward compatibility, we might
have to complete the symbol value handler mechanism)
menu-path
) can be created using the
make-menu-path
function, and specifies a location in a particular
menu instantiator where changes can be made. The first argument to
make-menu-path
specifies which menu to modify and can be a
specifier, a value such as nil
(which means to modify the default
menubar associated with the selected frame), or perhaps some other kind
of specification referring to some other menu, such as the context menus
invoked by the right mouse button. The second argument to
make-menu-path
, also required, is a list of zero or more strings
that specifies the particular menu or menu item in the instantiator that
is being referred to. The remaining arguments are optional and would be
a locale, a tag set, etc. The menu path object can be passed to
get
, put
or other standard property functions to access or
modify particular properties of a menu or a menu item. It can also be
passed to expanded versions of the existing functions such as
find-menu-item
, delete-menu-item
, add-menu-button
,
etc. (It is really a shame that add-menu-item
is an obsolete
function because it is a much better name than add-menu-button
).
Finally, the menu path object can be passed to the drop-handler
functions described in Drag-n-Drop Interface Changes to access or modify the drop handlers that are associated with
a particular menu item.
:help-echo
, :context-menu
and
:drop-handlers
, with similar semantics to the corresponding
keywords for toolbar items. (It may seem a bit strange at first to have
a context menu associated with a particular menu item, but it is a user
interface concept that exists both in Open Look and in Windows, and
really makes a lot of sense if you give it a bit of thought). These
properties may not actually be implemented at first, but at least the
keywords for them should be defined.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
Abstract: This page describes why the misc-user event type should be split up into a number of different event types, and how to do this.
The misc-user event should not exist as a single event type. It should
be split up into a number of different event types: one for scrollbar
events, one for menu events, and one or two for drag-n-drop events.
Possibly there will be other event types created in the future. The
reason for this is that the misc-user event was a bad design choice when
I made it, and it has only gotten worse with Oliver’s attempts to add
features to it to make it be used for drag-n-drop. I know that there
was originally a separate drag-n-drop event type, and it was folded into
the misc-user event type on my recommendation, but I have now realized
the error of my ways. I had originally created a single event type in
an attempt to prevent some Lisp programs from breaking because they
might have a case statement over various event types, and would not be
able to handle new event types appearing. I think now that these
programs simply need to be written in a way to handle new event types
appearing. It’s not very hard to do this. You just use predicates
instead of doing a case statement over the event type. If we preserve
the existing predicate called misc-user-event-p
, and just make
sure that it evaluates to true when given any user event type other than
the standard simple ones, then most existing code will not break either
when we split the event types up like this, or if we add any new event
types in the future.
More specifically, the only clean way to design the misc-user event type would be to add a sub-type field to it, and then have the nature of all the other fields in the event type be dependent on this sub-type. But then in essence, we’d just be reimplementing the whole event-type scheme inside of misc-user events, which would be rather pointless.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
43.11.1 Future Work – Abstracted Mouse Pointer Interface | ||
43.11.2 Future Work – Busy Pointer |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
Abstract: We need to create a new image format that allows
standard pointer shapes to be specified in a way that works on all
Windows systems. I suggest that this be called pointer
, which
has one tag associated with it, named :data
, and whose value is a
string. The possible strings that can be specified here are predefined
by XEmacs, and are guaranteed to work across all Windows systems. This
means that we may need to provide our own definition for pointer shapes
that are not standard on some systems. In particular, there are a lot
more standard pointer shapes under X than under Windows, and most of
these pointer shapes are fairly useful. There are also a few pointer
shapes (I think the hand, for example) on Windows, but not on X.
Converting the X pointer shapes to Windows should be easy because the
definitions of the pointer shapes are simply XBM files, which we can
read under Windows. Going the other way might be a little bit more
difficult, but it should still not be that hard.
While we’re at it, we should change the image format currently called
cursor-font
to x-cursor-font
, because it only works under
X Windows. We also need to change the format called resource
to
be mswindows-resource
. At least in the case of
cursor-font
, the old value should be maintained for compatibility
as an obsolete alias. The resource
format was added so recently
that it’s possible that we can just change it.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
Automatically make the mouse pointer switch to a busy shape (watch signal) when XEmacs has been "busy" for more than, e.g. 2 seconds. Define the busy time as the time since the last time that XEmacs was ready to receive input from the user. An implementation might be:
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
43.12.1 Future Work – Everything should obey duplicable extents |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
A lot of functions don’t properly track duplicable extents. For
example, the concat
function does, but the format
function
does not, and extents in keymap prompts are not displayed either. All
of the functions that generate strings or string-like entities should
track the extents that are associated with the strings. Currently this
is difficult because there is no general mechanism implemented for doing
this. I propose such a general mechanism, which would not be hard to
implement, and would be easy to use in other functions that build up
strings.
The basic idea is that we create a C structure that is analogous to a
Lisp string in that it contains string data and lists of extents for
that data. Unlike standard Lisp strings, however, this structure (let’s
call it lisp_string_struct
) can be incrementally updated and its
allocation is handled explicitly so that no garbage is generated. (This
is important for example, in the event-handling code which would want to
use this structure, but needs to not generate any garbage for efficiency
reasons). Both the string data and the list of extents in this string
are handled using dynarrs so that it is easy to incrementally update
this structure. Functions should exist to create and destroy instances
of lisp_string_struct
to generate a Lisp string from a
lisp_string_struct
and vice-versa to append a sub-string of a
Lisp string to a lisp_string_struct
, to just append characters to
a lisp_string_struct
, etc. The only thing possibly tricky about
implementing these functions is implementing the copying of extents from
a Lisp string into a lisp_string_struct
. However, there is
already a function copy_string_extents()
that does basically this
exact thing, and it should be easy to create a modified version of this
function.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
Abstract: The purpose of this proposal is to present a coherent plan for how development branches in XEmacs are managed. This will cover such issues as stable versus experimental branches, creating new branches, synchronizing patches between branches, and how version numbers are assigned to branches.
A development branch is defined to be a linear series of releases of the XEmacs code base, each of which is derived from the previous one. When the XEmacs development tree is forked and two branches are created where there used to be one, the branch that is intended to be more stable and have fewer changes made to it is considered the one that inherits the parent branch, and the other branch is considered to have begun at the branching point. The less stable of the two branches will eventually be forked again, while this will not happen usually to the more stable of the two branches, and its development will eventually come to an end. This means that every branch has a definite ending point. For example, the 20.x branch began at the point when the released 19.13 code tree was split into a 19.x and a 20.x branch, and a 20.x branch will end when the last 20.x release (probably numbered 20.5 or 20.6) is released.
I think that there should always be three active development branches at any time. These branches can be designated the stable, the semi-stable, and the experimental branches. This situation has existed in the current code tree as soon as the 21.0 development branch was split. In this situation, the stable branch is the 20.x series. The semi-stable branch is the 21.0 release and the stability releases that follow. The experimental branch is the branch that was created as the result of the 21.0 development branch split. Typically, the stable branch has been released for a long period of time. The semi-stable branch has been released for a short period of time, or is about to be released, and the experimental branch has not yet been released, and will probably not be released for awhile. The conditions that should hold in all circumstances are:
The reason for the second condition is to ensure that active development can always proceed and is never throttled, as is happening currently at the end of the 21.0 release cycle. What this means is that as soon as the experimental branch is deemed to be stable enough to go into feature freeze:
The stable branch is always in high resistance, which is to say that the only changes that can be made to the code are important bug fixes involving a small amount of code where it should be clear just by reading the code that no destabilizing code has been introduced. The semi-stable branch is in low resistance, which means that no major features can be added, but except right before a release fairly major code changes are allowed. Features can be added if they are sufficiently small, if they are deemed sufficiently critical due to severe problems that would exist if the features were not added (for example, replacement of the unexec mechanism with a portable solution would be a feature that could be added to the semi-stable branch provided that it did not involve an overly radical code re-architecture, because otherwise it might be impossible to build XEmacs on some architectures or with some compilers), or if the primary purpose of the new feature is to remedy an incompleteness in a recent architectural change that was not finished in a prior release due to lack of time (for example, abstracting the mouse pointer and list-of-colors interfaces, which were left out of 21.0). There is no feature resistance in place in the experimental branch, which allows full development to proceed at all times.
In general, both the stable and semi-stable branches will contain previous net releases. In addition, there will be beta releases in all three branches, and possibly development snapshots between the beta releases. It’s obviously necessary to have a good version numbering scheme in order to keep everything straight.
First of all, it needs to be immediately clear from the version number whether the release is a beta release or a net release. Steve has proposed getting rid of the beta version numbering system, which I think would be a big mistake. Furthermore, the net release version number and beta release version number should be kept separate, just as they are now, to make it completely clear where any particular release stands. There may be alternate ways of phrasing a beta release other than something like 21.0 beta 34, but in all such systems, the beta number needs to be zero for any release version. Three possible alternative systems, none of which I like very much, are:
Currently, the between-beta snapshots are not numbered, but I think that they probably should be. If appropriate scripts are handled to automate beta release, it should be very easy to have a version number automatically updated whenever a snapshot is made. The number could be added either as a separate snapshot number, and you’d have 21.0 beta 34 pre 1, which becomes before 21.0 beta 34; or we could make the beta number be floating point, and then the same snapshot would have to be called 21.0 beta 33.1. The latter solution seems quite kludgey to me.
There also needs to be a clear way to distinguish, when a net release is made, which branch the release is a part of. Again, three solutions come to mind:
With three active development branches, synchronizing code changes between the branches is obviously somewhat of a problem. To make things easier, I propose a few general guidelines:
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
xemacs.org
WebsiteAuthor: Ben Wing
The xemacs.org
web site is the face that XEmacs presents to the
outside world. In my opinion, its most important function is to present
information about XEmacs in such a way that solicits new XEmacs users
and co-contributors. Existing members of the XEmacs community can
probably find out most of the information they want to know about XEmacs
regardless of what shape the web site is in, or for that matter, perhaps
even if the web site doesn’t exist at all. However, potential new users
and co-contributors who go to the XEmacs web site and find it out of
date and/or lacking the information that they need are likely to be
turned away and may never return. For this reason, I think it’s
extremely important that the web site be up-to-date, well-organized, and
full of information that an inquisitive visitor is likely to want to
know.
The current XEmacs web site needs a lot of work if it is to meet these
standards. I don’t think it’s reasonable to expect one person to do all
of this work and make continual updates as needed, especially given the
dismal record that the XEmacs web site has had. The proper thing to do
is to place the web site itself under CVS and allow many of the core
members to remotely check files in and out. This way, for example,
Steve could update the part of the site that contains the current
release status of XEmacs. (Much of this could be done by a script that
Steve executes when he sends out a beta release announcement which
automatically HTML-izes the mail message and puts it in the appropriate
place on the web site. There are programs that are specifically
designed to convert email messages into HTML, for example
mhonarc
.) Meanwhile, the xemacs.org
mailing list
administrator (currently Jason Mastaler, I think) could maintain the
part of the site that describes the various mailing lists and other
addresses at xemacs.org
. Someone like me (perhaps through a
proxy typist) could maintain the part of the site that specifies the
future directions that XEmacs is going in, etc., etc.
Here are some things that I think it’s very important to add to the web site.
configure
in order for XEmacs to link with and make
use of these libraries or of Motif or CDE. Finally, this page should
list which versions of the various libraries are required for use with
the various different beta versions of XEmacs. (Remember, this can
change from beta to beta, and someone needs to keep a watchful eye on
this).
xemacs.org
and who is the maintainer or maintainers
for each of these packages.
We should try to keep an XEmacs presence in all of the major places on the web that are devoted to free software or to the "open source" community. This includes, for example, the open source web site at http://opensource.oreilly.com (I’m already in the process of contacting this site), the Freshmeat site at http://www.freshmeat.net, the various announcement news groups (for example, comp.os.linux.announce, and the Windows announcement news group) etc.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
43.15.1 Future Work – Keybinding Schemes | ||
43.15.2 Future Work – Better Support for Windows Style Key Bindings | ||
43.15.3 Future Work – Misc Key Binding Ideas |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
Abstract: We need a standard mechanism that allows a different global key binding schemes to be defined. Ideally, this would be the keyboard action interface that I have proposed, however this would require a lot of work on the part of mode maintainers and other external Elisp packages and will not be rady in the short term. So I propose a very kludgy interface, along the lines of what is done in Viper currently. Perhaps we can rip that key munging code out of Viper and make a separate extension that implements a global key binding scheme munging feature. This way a key binding scheme could rearrange all the default keys and have all sorts of other code, which depends on the standard keys being in their default location, still work.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
Abstract: This page describes how we could create an XEmacs extension that modifies the global key bindings so that a Windows user would feel at home when using the keyboard in XEmacs. Some of these bindings don’t conflict with standard XEmacs keybindings and should be added by default, or at the very least under Windows, and probably under X Windows as well. Other key bindings would need to be implemented in a Windows compatibility extension which can be enabled and disabled on the fly, following the conventions outlined in Standard interface for enabling extensions Ideally, this should be implemented using the keyboard action interface but these wil not be available in the short term, so we will have to resort to some awful kludges, following the model of Michael Kifer’s Viper mode.
We really need to make XEmacs provide standard Windows key bindings as much as possible. Currently, for example, there are at least two packages that allow the user to make a selection using the shifted arrow keys, and neither package works all that well, or is maintained. There should be one well-written piece of code that does this, and it should be a standard part of XEmacs. In fact, it should be turned on by default under Windows, and probably under X as well. (As an aside here, one point of contention in how to implement this involves what happens if you select a region using the shifted arrow keys and then hit the regular arrow keys. Does the region remain selected or not? I think there should be a variable that controls which of these two behaviors you want. We can argue over what the default value of this variable should be. The standard Windows behavior here is to keep the region selected, but move the insertion point elsewhere, which is unfortunately impossible to implement in XEmacs.)
Some thought should be given to what to do about the standard Windows
control and alt key bindings. Under NTEmacs, there is a variable that
controls whether the alt key behaves like the Emacs meta key, or whether
it is passed on to the menu as in standard Windows programs. We should
surely implement this and put this option on the Options menu.
Making Alt-f for example, invoke the File menu, is not
all that disruptive in XEmacs, because the user can always type ESC
f to get the meta key functionality. Making Control-x, for
example, do Cut, is much, much more problematic, of course, but
we should consider how to implement this anyway. One possibility would
be to move all of the current Emacs control key bindings onto
control-shift plus a key, and to make the simple control keys follow the
Windows standard as much as possible. This would mean, for example,
that we would have the following keybindings:
Control-x ==>
Cut
Control-c ==> Copy
Control-v ==>
Paste
Control-z ==> Undo
Control-f
==> Find
Control-a ==> Select All
Control-s ==> Save
Control-p ==> Print
Control-y ==> Redo
(this functionality is
available in XEmacs with Kyle Jones’ redo.el
package, but it
should be better integrated)
Control-n ==> New
Control-o ==> Open
Control-w ==> Close
Window
The changes described in the previous paragraph should be put into an
extension named windows-keys.el
(see
Standard interface for enabling extensions) so that it can be enabled and disabled on the fly using a
menu item and can be selected as the default for a particular user in
their custom options file. Once this is implemented, the Windows
installer should also be modified so that it brings up a dialog box that
allows the user to make a selection of which key binding scheme they
would prefer as the default, either the XEmacs standard bindings, Vi
bindings (which would be Viper mode), Windows-style bindings, Brief,
CodeWright, Visual C++, or whatever we manage to implement.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
maybe should be execute anonymous macros (other possibility is insert register but you can easily simulate with a keyboard macro)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
(int-to-char (1+ c)) |
where c is the arg, specbound.
(byte-compile-snippet (int-to-char (1+ c)) (c)) ^^^ environment of local vars |
43.16.1 Future Work – Autodetection | ||
43.16.2 Future Work – Conversion Error Detection | ||
43.16.3 Future Work – Unicode | ||
43.16.4 Future Work – BIDI Support | ||
43.16.5 Future Work – Localized Text/Messages |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There are various proposals contained here.
Author: Ben Wing
The current auto detection mechanism in XEmacs Mule has many problems. For one thing, it is wrong too much of the time. Another problem, although easily fixed, is that priority lists are fixed rather than varying, depending on the particular locale; and finally, it doesn’t warn the user when it’s not sure of the encoding or when there’s a mistake made during decoding. In both of these situations the user should be presented with a list of likely encodings and given the choice, rather than simply proceeding anyway and giving a result that is likely to be wrong and may result in data corruption when the file is saved out again.
All coding systems are categorized according to their type. Currently this includes ISO2022, Big 5, Shift-JIS, UTF8 and a few others. In the future there will be many more types defined and this mechanism will be generalized so that it is easily extendable by the Lisp programmer.
In general, each coding system type defines a series of subtypes which are handled differently for the purpose of detection. For example, ISO 2022 defines many different subtypes such as 7 bit, 8 bit, locking shift, designating and so on. UCS2 may define subtypes such as normal and byte reversed.
The detection engine works conceptually by calling the detection methods of all of the defined coding system types in parallel on successive chunks of data (which may, for example, be 4K in size, but where the size makes no difference except for optimization purposes) and watching the results until either a definite answer is determined or the end of data is reached. The way the definite answer is determined will be defined below. The detection method of the coding system type is passed some data and a chunk of memory, which the method uses to store its current state (and which is maintained separately for each coding system type by the detection engine between successive calls to the coding system type’s detection method). Its return value should be an alist consisting of a list of all of the defined subtypes for that coding system type along with a level of likelihood and a list of additional properties indicating certain features detected in the data. The extra properties returned are defined entirely by the particular coding system type and are used only in the algorithm described below under “user control.” However, the levels of likelihood have a standard meaning as follows:
Level 4 means “near certainty” and typically indicates that a signature has been detected, usually at the beginning of the data, indicating that the data is encoded in this particular coding system type. An example of this would be the byte order mark at the beginning of UCS2 encoded data or the GZIP mark at the beginning of GZIP data.
Level 3 means “highly likely” and indicates that tell-tale signs have been discovered in the data that are characteristic of this particular coding system type. Examples of this might be ISO 2022 escape sequences or the current Unicode end of line markers at regular intervals.
Level 2 means “strongly statistically likely” indicating that statistical analysis concludes that there’s a high chance that this data is encoded according to this particular type. For example, this might mean that for UCS2 data, there is a high proportion of null bytes or other repeated bytes in the odd-numbered bytes of the data and a high variance in the even-numbered bytes of the data. For Shift-JIS, this might indicate that there were no illegal Shift-JIS sequences and a fairly high occurrence of common Shift-JIS characters.
Level 1 means “weak statistical likelihood” meaning that there is some indication that the data is encoded in this coding system type. In fact, there is a reasonable chance that it may be some other type as well. This means, for example, that no illegal sequences were encountered and at least some data was encountered that is purposely not in other coding system types. For Shift-JIS data, this might mean that some bytes in the range 128 to 159 were encountered in the data.
Level 0 means “neutral” which is to say that there’s either not enough data to make any decision or that the data could well be interpreted as this type (meaning no illegal sequences), but there is little or no indication of anything particular to this particular type.
Level -1 means “weakly unlikely” meaning that some data was encountered that could conceivably be part of the coding system type but is probably not. For example, successively long line-lengths or very rarely-encountered sequences.
Level -2 means “strongly unlikely” meaning that typically a number of illegal sequences were encountered.
The algorithm to determine when to stop and indicate that the data has been detected as a particular coding system uses a priority list, which is typically specified as part of the language environment determined from the current locale or the user’s choice. This priority list consists of a list of coding system subtypes, along with a minimum level required for positive detection and optionally additional properties that need to be present. Using the return values from all of the detection methods called, the detection engine looks through this priority list until it finds a positive match. In this priority list, along with each subtype is a particular coding system to return when the subtype is encountered. (For example, in a Japanese-language environment particular subtypes of ISO 2022 will be associated with the Japanese coding system version of those subtypes). It is perfectly legal and quite common in fact, to list the same subtype more than once in the priority list with successively lower requirements. Other facts that can be listed in the priority list for a subtype are “reject”, meaning that the data should never be detected as this subtype, or “ask”, meaning that if the data is detected to be this subtype, the user will be asked whether they actually mean this. This latter property could be used, for example, towards the bottom of the priority list.
In addition there is a global variable which specifies the minimum number of characters required before any positive match is reported. There may actually be more than one such variable for different sources of data, for example, detection of files versus detection of subprocess data.
Whenever a file is opened and detected to be a particular coding system, the subtype, the coding system and the associated level of likelihood will be prominently displayed either in the echo area or in a status box somewhere.
If no positive match is found according to the priority list, or if the matches that are found have the “ask” property on them, then the user will be presented with a list of choices of possible encodings and asked to choose one. This list is typically sorted first by level of likelihood, and then within this, by the order in which the subtypes appear in the priority list. This list is displayed in a special kind of dialog box or other buffer allowing the user, in addition to just choosing a particular encoding, to view what the file would look like if it were decoded according to the type.
Furthermore, whenever a file is decoded according to a particular type, the decoding engine keeps track of status values that are output by the coding system type’s decoding method. Generally, this status will be in the form of errors or warnings of various levels, some of which may be severe enough to stop the decoding entirely, and some of which may either indicate definitely malformed data but from which it’s possible to recover, or simply data that appears rather questionable. If any of these status values are reported during decoding, the user will be informed of this and asked “are you sure?” As part of the “are you sure” dialog box or question, the user can display the results of the decoding to make sure it’s correct. If the user says “no, they’re not sure,” then the same list of choices as previously mentioned will be presented.
Also appeared under heading "Implementation of Coding System Priority Lists in Various Locales" ?
Author: Stephen Turnbull
Date: 11/1/1999 2:48 AM
>>>>> "Hrvoje" == Hrvoje Niksic <hniksic@srce.hr> writes: [Ben sez:] >> You are perfectly free to set up your XEmacs like this, but >> XEmacs/Mule will autodetect by default if there is no >> Content-Type: info and no reason to believe we are dealing with >> binary files. Hrvoje> In that case, it will be a serious mistake to make Hrvoje> --with-mule the default, ever. I think more care should Hrvoje> be shown in meeting the need of European users. |
Hrvoje, I don’t understand what you are worrying about. I suspect you are worrying about Handa’s hyperactive and obstinate Mule, not what Ben has in mind. Yes, Ben has said "better guessing," but that’s simply not reasonable without substantial language environment information. I think trying to detect Latin-1 vs Latin-2 in the POSIX locale would be a big mistake, I think trying to guess Big 5 v. Shift JIS in a European locale would be a big mistake.
If Ben doesn’t mean "more appropriate use of language environment information" when he writes "better guessing," I, as much as you, want to see how he plans to do that. Ben? ("Yes/no/oops I need to think about it" is good enough if you have specifics you intend to put in the RFC you’re planning to present.)
Let me give a formal proposal of what I would like to see in the autodetection specification.
N.B. This will cause breakage for all 1-byte users because the default case can no longer assume Latin-1. You may be able to use the TTY font or the Xt -font option to fake this, and default to iso8859-1; I would hope that we would not use such a kludge in the beta versions, although it might be satisfactory for general use. In particular, encodings like VISCII (Vietnamese) and I believe KOI-8 (Cyrillic) are not ISO-2022-clean, but using C1 control characters as a heuristic for detecting binary files is useful.
If we do allow it, I think that XEmacs should bitch and warn that the practices of implicitly specifying language environment by -font and defaulting on TTYs is deprecated and likely to be obsoleted.
Each of the following cases is given in the order of priority of detection. I’m not sure I’m serious about the top priority given the (optional) Unicode detection. This may be appropriate if Ben is right that ISO-2022 is going to disappear, but possibly not until then (two two-byte sequences out of 65536 is probably 1.99 too many). It probably isn’t too risky if (6)(c) is taken pretty seriously; a Unicode file should contain _no_ private use characters unless the encoding is explicitly specified, and that’s a block of 1/10 of the code space, which should help a lot in detecting binary files.
N.B. Latin-1 will be detected as binary, as for any Latin-*.
N.B. An explicit ISO-2022 designation is semantically equivalent to a Content-Type: header. It is more dangerous because shorter, but I think we should recognize them by default despite the slight risk; XEmacs is a text editor.
N.B. This is unlikely to be as dangerous as it looks at first glance. Any file that includes an 8-bit-set byte before the first valid designation should be detected as binary.
N.B. The reason for permitting a class is for cases like Cyrillic where there are both ISO-8859 encodings and incompatible encodings (KOI-8r) in common use. If you want to write a Latin-1 v. Latin-2 detector, be my guest, but I don’t think it would be easy or accurate.
Announce the result of autodetection to the user.
User may request decoding, with autodetected encoding(s) given priority in a list of available encodings.
zations (see (e) below) should avoid introducing data tion that this default procedure would avoid.
sly, it can’t be perfect if any autodecoding is done; like Hrvoje should have an easily available option to to this default (or an optimized approximation which t actually read the whole file into a buffer) or simply y everything as binary (with the “font” for binary files a user option).
This could be taken to extremes, like checking by table whether all characters in a Japanese file are actually legitimate JIS codes; that’s insane (and would cause corporate encodings to be recognized as binary). But we should think about the idea that autodetection shouldn’t mean XEmacs can’t change its mind.
Other comments:
It might be reasonable given Hrvoje’s objections to require that any autodetection that could cause data loss (any coding system that involves escape sequences, and only those AFAIK: by design translation to Unicode is invertible) by default prompt the user (presumable with a novice-like ability to retain the prompt, always default to binary, or always default to the autodetected encoding) in the future, at least in locales that don’t need it (POSIX, Latin-any).
Ben thinks that we can remember the input data; I think it’s going to be hard to comprehensively test that a highly optimized version works. Good design will help, but ISO-2022 is enormously complex, and there are many encodings that violate even its lax assumptions. On the other hand, memory is the only way to get non-rewindable streams right.
Hrvoje himself said he would like to have an XEmacs that distinguishes between Latin-1 and Latin-2 text. Where it is possible to do that, this is exactly what autodetection of ISO-2022 and Unicode gives you. Many people would want that, even at some risk of binary corruption.
>> Once again I remind you that XEmacs is a text editor. There >> are lots of files that potentially may have Japanese etc. in >> them without this marked, e.g. C or Elisp files in the XEmacs >> source. Surely you’re not arguing that we interpret even these >> files as binary by default?
Hrvoje> I am. If I want to see Japanese, I’ll setup my Hrvoje> environment that way. But I don’t, and neither do 99% of Hrvoje> Croatian users. I can’t speak for French, Italian, and Hrvoje> others, but I’d assume similar.
Hrvoje> If there is Japanese in the source files, I will see it as Hrvoje> escape sequences, which is perfectly fine, because I don’t Hrvoje> read Japanese.
And some (European) people will have their terminals scrambled, because Shift-JIS contains sequences that can change the state of XTerm (as do fixed-width Unicode and Big5). This may also be a problem with some Windows-12xx encodings; I’m not sure they all are ISO-2022-clean. (This isn’t a problem for XEmacs native X11 frames or native MS-Windows frames, and the XEmacs sources themselves are all in 7-bit ISO-2022 now IIRC. But it is a potential source of great frustration for many users.)
I think that should be considered too, although it is presumably lower priority than the data corruption of binary files.
Author: Ben Wing
Date: 11/1/1999 7:24 AM
Stephen, thank you very much for writing this up. I think it is a good start, and definitely moving in the direction I would like to see things going: more proposals, less arguing. (aka “more light, less heat”) However, I have some suggestions for cleaning this up:
You should try to make it more layered. For example, you might have one section devoted to the workings of autodetection, which starts out like this (the section numbers below are totally arbitrary):
Autodetect()
is a function whose arguments are (1) a readable stream, (2) some
hints indicating how the autodetection is to proceed, and (3) a value
indicating the maximum number of characters to examine at the beginning of the
stream. (Possibly, the value in (3) may be some special symbol indicating
that we only go as far as the next line, or a certain number of lines ahead;
this would be used as part of "continuous autodetection", e.g. we are decoding
the results of an interactive terminal session, where the user may
periodically switch encodings, line terminations, etc. as different programs
get run and/or telnet or similar sessions are entered into and exited.) We
assume the stream is rewindable; if not, insert a "rewinding" stream in front
of the non-rewinding stream; this kind of stream automatically buffers the
data as necessary.
[You can use pseudo-code terminology here. No need for straight C or ELisp.]
[Then proceed to describe what the hints look like – e.g. you could portray
it as a property list or whatever. The idea is that, for each locale, there
is a corresponding hints value that is used at least by default. The hints
structure also has to be set up to allow for two or more competing hints
specifications to be merged together. For example, the extension of a file
might provide an additional hint or hints about how to interpret the data of
that file, and the caller of autodetect()
, when calling autodetect()
on such a
file, would need to have a way of gracefully merging the default hints
corresponding to the locale with the more specific hints provided by the
extension. Furthermore, users like Hrvoje might well want to provide their
own hints to supplement and override parts of the generic hints – e.g. "I
don’t ever want to see non-European encodings decoded; treat them as binary
instead".]
[Then describe algorithmically how the autodetection works. First, you could
describe it more generally, i.e. presenting an algorithmic overview, then you
could discuss in detail exactly how autodetection of a particular type of
external encoding works – e.g. "for iso2022, we first look for an escape
character, followed by a byte in this range [. ... .] etc."]
This section describes the concept of a locale in XEmacs, and how it is derived from the user’s environment. A locale in XEmacs is a pair, a country and a language, together determining the handling of locale-specific areas of XEmacs. All locale-specific areas in XEmacs make use of this XEmacs locale, and do not attempt to derive the locale from any other sources. The user is free to change the current locale at any time; accessor and mutator functions are provided to do this so that various locale-specific areas can optionally be changed together with it.
[Then you describe how the XEmacs locale is extracted from .emacs, from
setlocale()
, from the LANG environment variables, from -font, or wherever
else. All other sections assume this dirty work is done and never even
mention it]
[Here you describe the default autodetect()
hints value corresponding to each
possible locale. You should probably use a schematic description here, e.g.
an actual Lisp property list, liberally commented.]
[Other sections cover anything I’ve missed. By being very careful to separate out the layers, you simultaneously introduce more rigor (easier to catch bugs) and make it easier for someone else to understand it completely.]
Author: Ben Wing
however, in general the detection code has major problems and needs lots of work:
[see ‘file-coding.h’]
Let’s consider what this might mean for an ASCII text detector. (In order to have accurate detection, especially given the iteration I proposed below, we need active detectors for all types of data we might reasonably encounter, such as ASCII text files, binary files, and possibly other sorts of ASCII files, and not assume that simply "falling back to no detection" will work at all well.)
An ASCII text detector DOES NOT report ASCII text as level 0, since that’s what the detector is looking for. Such a detector ideally wants all bytes in the range 0x20 - 0x7E (no high bytes!), except for whitespace control chars and perhaps a few others; LF, CR, or CRLF sequences at regular intervals (where "regular" might mean an average < 100 chars and 99% < 300 for code and other stuff of the "text file w/line breaks" variety, but for the "text file w/o line breaks" variety, excluding blank lines, averages could easily be 600 or more with 2000-3000 char "lines" not so uncommon); similar statistical variance between odds and evens (not Unicode); frequent occurrences of the space character; letters more common than non-letters; etc. Also checking for too little variability between frequencies of characters and for exclusion of particular characters based on character ranges can catch ASCII encodings like base-64, UUEncode, UTF-7, etc. Granted, this doesn’t even apply to everything called "ASCII", and we could potentially distinguish off ASCII for code, ASCII for text, etc. as separate categories. However, it does give us a lot to work off of, in deciding what likelihood to choose – and it shows there’s in fact a lot of detectable patterns to look for even in something seemingly so generic as ASCII. The detector would report most text files in level 1 or level 2. EUC encodings, Shift-JIS, etc. probably go to level -1 because they also pass the EOL test and all other tests for the ASCII part of the text, but have lots of high bytes, which in essence turn them into binary. Aberrant text files like something in BASE64 encoding might get placed in level 0, because they pass most tests but fail dramatically the frequency test; but they should not be reported as any lower, because that would cause explicit prompting, and the user should be able any valid text file without prompting. The escape sequences and the base-64-type checks might send 7-bit iso2022 to 0, but probably not -1, for similar reasons.
no-conversion
or something equivalent). it might make sense to divide things into
two phases (internal and external), where the internal phase has a
separate category list and would probably mostly end up handling EOL
detection; but the i think about it, the more i disagree. with
properly written detectors, and properly organized tables (in
general, those decodings that are more "distinctive" and thus
detectable with greater certainty go lower on the list), we shouldn’t
need two phases. for example, let’s say the example above was also
in CRLF format. The EOL detector (which really detects *plain text*
with a particular EOL type) would return at most level 0 for all
results until the text file is reached, whereas the base64, gzip or
euc-jp decoders will return higher. Once the text file is reached,
the EOL detector will return 0 or higher for the CRLF encoding, and
all other detectors will return 0 or lower; thus, we will successfully
proceed through CRLF decoding, or at worst prompt the user. (The only
external-vs-internal distinction that might make sense here is to
favor coding systems of the correct source type over those that
require conversion between external and internal; if done right, this
could allow the CRLF detector to return level 1 for all CRLF-encoded
text files, even those that look like Base-64 or similar encoding, so
that CRLF encoding will always get decoded without prompting, but not
interfere with other decoders. On the other hand, this
external-vs-internal distinction may not matter at all – with
automatic internal-external conversion, CRLF decoding can occur
before or after decoding of euc-jp, base64, iso2022, or similar,
without any difference in the final results.)
#### What are we trying to say? In base64, the CRLF decoding before base64 decoding is irrelevant, they will be thrown out as whitespace is not significant in base64.
[sjt considers all of this to be rather bogus. Ideas like "greater certainty" and "distinctive" can and should be quantified. The issue of proper table organization should be a question of optimization.]
[sjt wonders if it might not be a good idea to use Unicode’s newline character as the internal representation so that (for non-Unicode coding systems) we can catch EOL bugs on Unix too.]
ben [at least that’s what sjt thinks]
*****
Author: Stephen Turnbull
While this is clearly something of an improvement over earlier designs, it doesn’t deal with the most important issue: to do better than categories (which in the medium term is mostly going to mean "which flavor of Unicode is this?"), we need to look at statistical behavior rather than ruling out categories via presence of specific sequences. This means the stream processor should
–sjt
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
A preliminary and simple implementation is:
But you could implement it much more simply and usefully by just determining, for any text being decoded into mule-internal, can we go back and read the source again? If not, remember the entire file (GNUS message, etc) in text properties. Then, implement the UI interface (like Netscape’s) on top of that. This way, you have something that at least works, but it might be inefficient. All we would need to do is work on making the underlying implementation more efficient.
A more detailed proposal for avoiding binary file corruption is
Basic idea: A coding system is a filter converting an entire input stream into an output stream. The resulting stream can be said to be "correspondent to" the input stream. Similarly, smaller units can correspond. These could potentially include zero width intervals on either side, but we avoid this. Specifically, the coding system works like:
loop (input) { Read bytes till we have enough to generate a translated character or a chars. This establishes a "correspondence" between the whole input and output more or less in minimal chunks. }We then do the following processing:
- Eliminate correspondences where one or the other of the I/O streams has a zero interval by combining with an adjacent interval;
- Group together all adjacent "identity" correspondences into as large groups as possible;
- Use text properties to store the non-identity correspondences on the characters. For identity correspondences, use a simple text property on all that contains no data but just indicates that the whole string of text is identity corresponded. (How do we define "identity"? Latin 1 or could it be something else? For example, Latin 2)?
- Figure out the procedures when text is inserted/deleted and copied or pasted.
- Figure out to save the file out making use of the correspondences. Allow ways of saving without correspondences, and doing a "save to buffer with and without correspondences." Need to be clever when dealing with modal coding systems to parse the correspondences to get the internal state right.
Author: Ben Wing
Nov 4, 1999
Finally, I don’t think "save the input" is as hard as you make it out to be. Conceptually, in fact, it’s simple: for each minimal group of bytes where you cannot absolutely guarantee that an external->internal transformation is reversible, you put a text property on the corresponding internal character indicating the bytes that generated this character. We also put a text property on every character, indicating the coding system that caused the transformation. This latter text property is extremely efficient (e.g. in a buffer with no data pasted from elsewhere, it will map to a single extent over all the buffer), and the former cases should not be prevalent enough to cause a lot of inefficiency, esp. if we define what "reversible" means for each coding system in such a way that it correctly handles the most common cases. The hardest part, in fact, is making all the string/text handling in XEmacs be robust w.r.t. text properties.
Author: Stephen Turnbull
We really want to separate out a number of things. Conceptually, there is a nested syntax.
At the top level is the ISO 2022 extension syntax, including charset designation and invocation, and certain auxiliary controls such as the ISO 6429 direction specification. These are octet-oriented, with the single exception (AFAIK) of the "exit Unicode" sequence which uses the UTF’s natural width (1 byte for UTF-7 and UTF-8, 2 bytes for UCS-2 and UTF-16, and 4 bytes for UCS-4 and UTF-32). This will be treated as a (deprecated) special case in Unicode processing.
The middle layer is ISO 2022 character interpretation. This will depend on the current state of the ISO 2022 registers, and assembles octets into the character’s internal representation.
The lowest level is translating system control conventions. At present this is restricted to newline translation, but one could imagine doing tab conversion or line wrapping here. "Escape from Unicode" processing would be done at this level.
At each level the parser will verify the syntax. In the case of a syntax error or warning (such as a redundant escape sequence that affects no characters), the parser will take some action, typically inserting the erroneous octets directly into the output and creating an annotation which can be used by higher level I/O to mark the affected region.
This should make it possible to do something sensible about separating newline convention processing from character construction, and about preventing ISO 2022 escape sequences from being recognized inappropriately.
The basic strategy will be to have octet classification tables, and switch processing according to the table entry.
It’s possible that, by doing the processing with tables of functions or the like, the parser can be used for both detection and translation.
Author: Ben Wing
When writing a file, we need error detection; otherwise somebody will create a Unicode file without realizing the coding system of the buffer is Raw, and then lose all the non-ASCII/Latin-1 text when it’s written out. We need two levels
- first, a "safe-charset" level that checks before any actual encoding to see if all characters in the document can safely be represented using the given coding system. FSF has a "safe-charset" property of coding systems, but it’s stupid because this information can be automatically derived from the coding system, at least the vast majority of the time. What we need is some sort of alternative-coding-system-precedence-list, langenv-specific, where everything on it can be checked for safe charsets and then the user given a list of possibilities. When the user does "save with specified encoding", they should see the same precedence list. Again like with other precedence lists, there’s also a global one, and presumably all coding systems not on other list get appended to the end (and perhaps not checked at all when doing safe-checking?). safe-checking should work something like this: compile a list of all charsets used in the buffer, along with a count of chars used. that way, "slightly unsafe" coding systems can perhaps be presented at the end, which will lose only a few characters and are perhaps what the users were looking for.
[sjt sez this whole step is a crock. If a universal coding system is unacceptable, the user had better know what he/she is doing, and explicitly specify a lossy encoding. In principle, we can simply check for characters being writable as we go along. Eg, via an "unrepresentable character handler." We still have the buffer contents. If we can’t successfully save, then ask the user what to do. (Do we ever simply destroy previous file version before completing a write?)]
- when actually writing out, we need error checking in case an individual char in a charset can’t be written even though the charsets are safe. again, the user gets the choice of other reasonable coding systems.
[sjt – something is very confused, here; safe charsets should be defined as those charsets all of whose characters can be encoded.]
- same thing (error checking, list of alternatives, etc.) needs to happen when reading! all of this will be a lot of work!
Author: Stephen Turnbull
I don’t much like Ben’s scheme. First, this isn’t an issue of I/O, it’s a coding issue. It can happen in many places, not just on stream I/O. Error checking should take place on all translations. Second, the two-pass algorithm should be avoided if possible. In some cases (eg, output to a tty) we won’t be able to go back and change the previously output data. Third, the whole idea of having a buffer full of arbitrary characters which we’re going to somehow shoehorn into a file based on some twit user’s less than informed idea of a coding system is kind of laughable from the start. If we’re going to say that a buffer has a coding system, shouldn’t we enforce restrictions on what you can put into it? Fourth, what’s the point of having safe charsets if some of the characters in them are unsafe? Fifth, what makes you think we’re going to have a list of charsets? It seems to me that there might be reasons to have user-defined charsets (eg, "German" vs "French" subsets of ISO 8859/15). Sixth, the idea of having language environment determine precedence doesn’t seem very useful to me. Users who are working with a language that corresponds to the language environment are not going to run into safe charsets problems. It’s users who are outside of their usual language environment who run into trouble. Also, the reason for specifying anything other than a universal coding system is normally restrictions imposed by other users or applications. Seventh, the statistical feedback isn’t terribly useful. Users rarely "want" a coding system, they want their file saved in a useful way. We could add a FORCE argument to conversions for those who really want a specific coding system. But mostly, a user might want to edit out a few unsafe characters. So (up to some maximum) we should keep a list of unsafe text positions, and provide a convenient function for traversing them.
–sjt
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
Following is an old proposal. Unicode has been implemented already, in a different fashion; but there are some ideas here for more general support, e.g. properties of Unicode characters other than their mappings to particular charsets.
We recognize 128, [256], 128x128, [256x256] for source charsets;
for Unicode, 256x256 or 16x256x256.
In all cases, use tables of tables and substitute a default subtable if entire row is empty.
If destination is Unicode, either 16 or 32 bits.
If destination is charset, either 8 or 16 bits.
For the moment, since we only do 94, 96, 94x94 or 96x96, only do 128 or 128x128 for source charsets and use the range 33-126 or 32-127. (Except ASCII - we special case that and have no table because we can algorithmically translate)
Also have a 16x256x256 table -> 32 bits of Unicode char properties.
A particular charset contains two associated mapping tables, for both directions.
API is set-unicode-mapping:
(set-unicode-mapping unicode char unicode charset-code charset-offset unicode vector of char unicode list of char unicode string of char unicode vector or list of codes charset-offset |
Establishes a mapping between a unicode codepoint (a fixnum) and one or more chars in a charset. The mapping is automatically established in both directions. Chars in a charset can be specified either with an actual character or a codepoint (i.e. an fixnum) and the charset it’s within. If a sequence of chars or charset points is given, multiple mappings are established for consecutive unicode codepoints starting with the given one. Charset codepoints are specified as most-significant x 256 + least significant, with both bytes in the range 33-126 (for 94 or 94x94) or 32-127 (for 96 or 96x96), unless an offset is given, which will be subtracted from each byte. (Most common values are 128, for codepoints given with the high bit set, or -32, for codepoints given as 1-94 or 0-95.)
Other APIs:
(write-unicode-mapping file charset) |
Write the mapping table for a particular charset to the specified file. The tables are written in an internal format that allows for efficient loading, for portability across platforms and XEmacs invocations, for conserving space, for appending multiple tables one directly after another with no need for a directory anywhere in the file, and for reorganizing a file as in this format (with a magic sequence at the beginning). The data will be appended at the end of a file, so that multiple tables can be written to a file; remove the file first to avoid this.
(write-unicode-properties file unicode-codepoint length) |
Write the Unicode properties (not including charset mappings) for the specified range of contiguous Unicode codepoints to the end of the file (i.e. append mode) in a binary format similar to what was mentioned in the write-unicode-mapping description and with the same features.
Extension to set-unicode-mapping:
(set-unicode-mapping list-or-vector-of-unicode-codepoints char "" charset-code charset-offset "" sequence of char "" list-or-vector-of-codes charset-offset |
The first two forms are conceptually the inverse of the forms above to specify characters for a contiguous range of Unicode codepoints. These new forms let you specify the Unicode codepoints for a contiguous range of chars in a charset. "Contiguous" here means that if we run off the end of a row, we go to the first entry of the next row, rather than to an invalid code point. For example, in a 94x94 charset, valid rows and columns are in the range 0x21-0x7e; after 0x457c 0x457d 4x457e goes 0x4621, not something like 0x457f, which is invalid.
The final two forms are the most general, letting you specify an arbitrary set of both Unicode points and charset chars, and the two are matched up just like a series of individual calls. However, if the lists or vectors do not have the same length, an error is signaled.
(load-unicode-mapping file &optional charset) |
If charset is omitted, loads all charset mapping tables found and returns a list of the charsets found. If charset is specified, searches through the file for the appropriate mapping tables. (This is extremely fast because each entry in the file gives an offset to the next one). Returns t if found.
(load-unicode-properties file unicode-codepoint) |
(list-unicode-entries file) |
(autoload-unicode-mapping charset) |
...
(unfinished)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
NOTE: There is existing message translation in X Windows of menu names. This is handled through X resources. The files are in ‘PACKAGES/mule-packages/locale/app-defaults/LOCALE/Emacs’, where locale is ‘ja’, ‘fr’, etc.
See lib-src/make-msgfile.lex.
Long comment from jwz, some additions from ben marked "ben":
(much of this comment is outdated, and a lot of it is actually implemented)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Jamie Zawinski
this isn’t implemented yet, but this is the plan-in-progress
In general, it’s accepted that the best way to internationalize is for all messages to be referred to by a symbolic name (or number) and come out of a table or tables, which are easy to change.
However, with Emacs, we’ve got the task of internationalizing a huge body of existing code, which already contains messages internally.
For the C code we’ve got two options:
gettext()
form, which takes an "english" string which
appears literally in the source, and uses that as a hash key to find
a translated string;
In this case, it’s desirable to make as few changes as possible to the C code, to make it easier to merge the code with the FSF version of emacs which won’t ever have these changes made to it. So we should go with the former option.
The way it has been done (between 19.8 and 19.9) was to use gettext()
, but
also to make massive changes to the source code. The goal now is to use
gettext()
at run-time and yet not require a textual change to every line
in the C code which contains a string constant. A possible way to do this
is described below.
(gettext()
can be implemented in terms of catgets()
for non-Sun systems, so
that in itself isn’t a problem.)
For the Lisp code, we’ve got basically the same options: put everything in a table, or translate things implicitly.
Another kink that lisp code introduces is that there are thousands of third- party packages, so changing the source for all of those is simply not an option.
Is it a goal that if some third party package displays a message which is one we know how to translate, then we translate it? I think this is a worthy goal. It remains to be seen how well it will work in practice.
So, we should endeavor to minimize the impact on the lisp code. Certain primitive lisp routines (the stuff in lisp/prim/, and especially in ‘cmdloop.el’ and ‘minibuf.el’) may need to be changed to know about translation, but that’s an ideologically clean thing to do because those are considered a part of the emacs substrate.
However, if we find ourselves wanting to make changes to, say, RMAIL, then something has gone wrong. (Except to do things like remove assumptions about the order of words within a sentence, or how pluralization works.)
There are two parts to the task of displaying translated strings to the user: the first is to extract the strings which need to be translated from the sources; and the second is to make some call which will translate those strings before they are presented to the user.
The old way was to use the same form to do both, that is, GETTEXT()
was both
the tag that we searched for to build a catalog, and was the form which did
the translation. The new plan is to separate these two things more: the
tags that we search for to build the catalog will be stuff that was in there
already, and the translation will get done in some more centralized, lower
level place.
This program (‘make-msgfile.c’) addresses the first part, extracting the strings.
For the emacs C code, we need to recognize the following patterns:
message ("string" ... ) error ("string") report_file_error ("string" ... ) signal_simple_error ("string" ... ) signal_simple_error_2 ("string" ... ) build_translated_string ("string") #### add this and use it instead of |
For the emacs Lisp code, we need to recognize the following patterns:
(message "string" ... ) (error "string" ... ) (format "string" ... ) (read-from-minibuffer "string" ... ) (read-shell-command "string" ... ) (y-or-n-p "string" ... ) (yes-or-no-p "string" ... ) (read-file-name "string" ... ) (temp-minibuffer-message "string") (query-replace-read-args "string" ... ) |
I expect there will be a lot like the above; basically, any function which
is a commonly used wrapper around an eventual call to message
or
read-from-minibuffer
needs to be recognized by this program.
(dgettext "domain-name" "string") #### do we still need this? things that should probably be restructured: |
Author: Ben Wing
ben: (format) is a tricky case. If I use format to create a string that I then send to a file, I probably don’t want the string translated. On the other hand, If the string gets used as an argument to (y-or-n-p) or some such function, I do want it translated, and it needs to be translated before the %s and such are replaced. The proper solution here is for (format) and other functions that call gettext but don’t immediately output the string to the user to add the translated (and formatted) string as a string property of the object, and have functions that output potentially translated strings look for a "translated string" property. Of course, this will fail if someone does something like
(y-or-n-p (concat (if you-p "Do you " "Does he ") (format "want to delete %s? " filename)))) |
But you shouldn’t be doing things like this anyway.
ben: Also, to avoid excessive translating, strings should be marked as translated once they get translated, and further calls to gettext don’t do any more translating. Otherwise, a call like
(y-or-n-p (format "Delete %s? " filename)) |
would cause translation on both the pre-formatted and post-formatted strings, which could lead to weird results in some cases (y-or-n-p has to translate its argument because someone could pass a string to it directly). Note that the "translating too much" solution outlined below could be implemented by just marking all strings that don’t come from a .el or .elc file as already translated.
Menu descriptors: one way to extract the strings in menu labels would be to teach this program about "^(defvar .*menu\n" forms; that’s probably kind of hard, though, so perhaps a better approach would be to make this program recognize lines of the form
"string" ... ;###translate |
where the magic token ";###translate" on a line means that the string constant on this line should go into the message catalog. This is analogous to the magic ";###autoload" comments, and to the magic comments used in the EPSF structuring conventions.
—– So this program manages to build up a catalog of strings to be translated. To address the second part of the problem, of actually looking up the translations, there are hooks in a small number of low level places in emacs.
Assume the existence of a C function gettext(str) which returns the translation of str if there is one, otherwise returns str.
message()
takes a char* as its argument, and always filters it through
gettext()
before displaying it.
display-error
which
doesn’t call message
directly (it princ’s to streams), so it must be
carefully coded to translate its arguments. This is only a few lines
of code.
Fread_minibuffer_internal()
is the lowest level interface to all minibuf
interactions, so it is responsible for translating the value that will go
into Vminibuf_prompt.
gettext()
.
The above take care of 99% of all messages the user ever sees.
What should we do about this? We could hack query-replace-read-args to translate its args, but might this be a more general problem? I don’t think we ought to translate all calls to format. We could just change the calling sequence, since this is odd in that the first %s wants to be translated but the second doesn’t.
Solving the "translating too much" problem:
The concern has been raised that in this situation:
then we would display the translation of Help, which would not be correct. We can solve this by adding a bit to Lisp_String objects which identifies them as having been read as literal constants from a .el or .elc file (as opposed to having been constructed at run time as it would in the above case.) To solve this:
Fmessage()
takes a lisp string as its first argument.
If that string is a constant, that is, was read from a source file
as a literal, then it calls message()
with it, which translates.
Otherwise, it calls message_no_translate()
, which does not translate.
Ferror()
(actually, Fsignal()
when condition is Qerror) works similarly.
More specifically, we do:
Scan specified C and Lisp files, extracting the following messages:
C files: GETTEXT (...) DEFER_GETTEXT (...) DEFUN interactive prompts Lisp files: (gettext ...) (dgettext "domain-name" ...) (defer-gettext ...) (interactive ...)The arguments given to this program are all the C and Lisp source files of GNU Emacs. .el and .c files are allowed. There is no support for .elc files at this time, but they may be specified; the corresponding .el file will be used. Similarly, .o files can also be specified, and the corresponding .c file will be used. This helps the makefile pass the correct list of files.
The results, which go to standard output or to a file specified with -a or -o (-a to append, -o to start from nothing), are quoted strings wrapped in gettext(...). The results can be passed to xgettext to produce a .po message file.
However, we also need to do the following:
- Definition of Arg below won’t handle a generalized argument as might appear in a function call. This is fine for DEFUN and friends, because only simple arguments appear there; but it might run into problems if Arg is used for other sorts of functions.
snarf()
should be modified so that it doesn’t output null strings and non-textual strings (see the comment at the top of ‘make-msgfile.c’).- parsing of (insert) should snarf all of the arguments.
- need to add set-keymap-prompt and deal with gettext of that.
- parsing of arguments should snarf all strings anywhere within the arguments, rather than just looking for a string as the argument. This allows if statements as arguments to get parsed.
begin_paren_counting()
et al. should handle recursive entry.- handle set-window-buffer and other such functions that take a buffer as the other-than-first argument.
- there is a fair amount of work to be done on the C code. Look through the code for #### comments associated with ’#ifdef I18N3’ or with an I18N3 nearby.
- Deal with
get-buffer-process
et al.- Many of the changes in the Lisp code marked ’rewritten for I18N3 snarfing’ should be undone once (5) is implemented.
- Go through the Lisp code in prim and make sure that all strings are gettexted as necessary. This may reveal more things to implement.
- Do the equivalent of (8) for the Lisp code.
- Deal with parsing of menu specifications.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
Expose XEmacs internal lstreams to Lisp as stream objects. (In addition to the functions given below, each stream object has properties that can be associated with it using the standard put, get etc. API. For GNU Emacs, where put and get have not been extended to be general property functions, but work only on strings, we would have to create functions set-stream-property, stream-property, remove-stream-property, and stream-properties. These provide the same functionality as the generic get, put, remprop, and object-plist functions under XEmacs)
(Implement properties using a hash table, and generalize this so that it is extremely easy to add a property interface onto any kind of object)
(write-stream STREAM STRING) |
Write the STRING to the STREAM. This will signal an error if all the bytes cannot be written.
(read-stream STREAM &optional N SEQUENCE) |
Reads data from STREAM. N specifies the number of bytes or characters, depending on the stream. SEQUENCE specifies where to write the data into. If N is not specified, data is read until end of file. If SEQUENCE is not specified, the data is returned as a stream. If SEQUENCE is specified, the SEQUENCE must be large enough to hold the data.
(push-stream-marker STREAM) |
returns ID, probably a stream marker object
(pop-stream-marker STREAM) |
backs up stream to last marker
(unread-stream STREAM STRING) |
The only valid STREAM is an input stream in which case the data in STRING is pushed back and will be read ahead of all other data. In general, there is no limit to the amount of data that can be unread or the number of times that unread-stream can be called before another read.
(stream-available-chars STREAM) |
This returns the number of characters (or bytes) that can definitely be read from the screen without an error. This can be useful, for example, when dealing with non-blocking streams when an attempt to read too much data will result in a blocking error.
(stream-seekable-p STREAM) |
Returns true if the stream is seekable. If false, operations such as seek-stream and stream-position will signal an error. However, the functions set-stream-marker and seek-stream-marker will still succeed for an input stream.
(stream-position STREAM) |
If STREAM is a seekable stream, returns a position which can be passed to seek-stream.
(seek-stream STREAM N) |
If STREAM is a seekable stream, move to the position indicated by N, otherwise signal an error.
(set-stream-marker STREAM) |
If STREAM is an input stream, create a marker at the current position, which can later be moved back to. The stream does not need to be a seekable stream. In this case, all successive data will be buffered to simulate the effect of a seekable stream. Therefore use this function with care.
(seek-stream-marker STREAM marker) |
Move the stream back to the position that was stored in the marker object. (this is generally an opaque object of type stream-marker).
(delete-stream-marker MARKER) |
Destroy the stream marker and if the stream is a non-seekable stream and there are no other stream markers pointing to an earlier position, frees up some buffering information.
(delete-stream STREAM N) |
(delete-stream-marker STREAM ID) |
(close-stream stream) |
Writes any remaining data to the stream and closes it and the object to which it’s attached. This also happens automatically when the stream is garbage collected.
(getchar-stream STREAM) |
Return a single character from the stream. (This may be a single byte depending on the nature of the stream). This is actually a macro with an extremely efficient implementation (as efficient as you can get in Emacs Lisp), so that this can be used without fear in a loop. The implementation works by reading a large amount of data into a vector and then simply using the function AREF to read characters one by one from the vector. Because AREF is one of the primitives handled specially by the byte interpreter, this will be very efficient. The actual implementation may in fact use the function call-with-condition-handler to avoid the necessity of checking for overflow. Its typical implementation is to fetch the vector containing the characters as a stream property, as well as the index into that vector. Then it retrieves the character and increments the value and stores it back in the stream. As a first implementation, we check to see when we are reading the character whether the character would be out of range. If so, we read another 4096 characters, storing them into the same vector, setting the index back to the beginning, and then proceeding with the rest of the getchar algorithm.
(putchar-stream STREAM CHAR) |
This is similar to getchar-stream but it writes data instead of reading data.
Function make-stream |
There are actually two stream-creation functions, which are:
(make-input-stream TYPE PROPERTIES) (make-output-stream TYPE PROPERTIES) |
These can be used to create a stream that reads data, or writes data, respectively. PROPERTIES is a property list and the allowable properties in it are defined by the type. Possible types are:
file
(this reads data from a file or writes to a file)
Allowable properties are:
:file-name
(the name of the file)
:create
(for output streams only, creates the file if it doesn’t already exist)
:exclusive
(for output streams only, fails if the file already exists)
:append
(for output streams only; starts appending to the end of the file rather than overwriting the file)
:offset
(positions in bytes in the file where reading or writing should begin. If unspecified, defaults to the beginning of the file or to the end of the file when :appended specified)
:count
(for input streams only, the number of bytes to read from the file before signaling "end of file". If nil or omitted, the number of bytes is unlimited)
:non-blocking
(if true, reads or writes will fail if the operation would block. This only makes sense for non-regular files).
process
(For output streams only, send data to a process.)
Allowable properties are:
:process
(the process object)
buffer
(Read from or write to a buffer.)
Allowable properties are:
:buffer
(the name of the buffer or the buffer object.)
:start
(the position to start reading from or writing to. If nil, use the buffer point. If true, use the buffer’s point and move point beyond the end of the data read or written.)
:end
(only for input streams, the position to stop reading at. If nil, continue to the end of the buffer.)
:ignore-accessible
(if true, the default for :start and :end ignore any narrowing of the buffer.)
stream
(read from or write to a lisp stream)
Allowable properties are:
:stream
(the stream object)
:offset
(the position to begin to be reading from or writing to)
:length
(For input streams only, the amount of data to read, defaulting to the rest of the data in the string. Revise string for output streams only if true, the stream is resized as necessary to accommodate data written off the end, otherwise the writes will fail.
memory
(For output only, writes data to an internal memory
buffer. This is more lightweight than using a Lisp buffer. The
function memory-stream-string can be used to convert the memory
into a string.)
debugging
(For output streams only, write data to the debugging
output.)
stream-device
(During non-interactive invocations only, Read
from or write to the initial stream terminal device.)
function
(For output streams only, send data by calling a
function, exactly as with the STREAM argument to the print
primitive.)
Allowable Properties are:
:function
(the function to call. The function is called with one argument, the stream.)
marker
(Write data to the location pointed to by a marker and
move the marker past the data.)
Allowable properties are:
:marker
(the marker object.)
decoding
(As an input stream, reads data from another stream and
decodes it according to a coding system. As an output stream
decodes the data written to it according to a coding system and
then writes results in another stream.)
Properties are:
:coding-system
(the symbol of coding system object, which defines the decoding.)
:stream
(the stream on the other end.)
encoding
(As an input stream, reads data from another stream and
encodes it according to a coding system. As an output stream
encodes the data written to it according to a coding system and
then writes results in another stream.)
Properties are:
:coding-system
(the symbol of coding system object, which defines the encoding.)
:stream
(the stream on the other end.)
Consider
(define-stream-type 'type :read-function :write-function :rewind- :seek- :tell- (?:buffer) |
Old Notes:
Expose lstreams as hash (put get etc. properties) table.
(write-stream stream string) (read-stream stream &optional n sequence) (make-stream ...) (push-stream-marker stream) returns ID prob a stream marker object (pop-stream-marker stream) backs up stream to last marker (unread-stream stream string) (stream-available-chars stream) (seek-stream stream n) (delete-stream stream n) (delete-stream-marker stream ic) can always be poe only nested if you have set stream marker (get-char-stream generalizes stream) a macro that tries to be efficient perhaps by reading the next e.g. 512 characters into a vector and arefing them. Might check aref optimization for vectors in the byte interpreter. (make-stream 'process :process ... :type write) Consider (define-stream-type 'type :read-function :write-function :rewind- :seek- :tell- (?:buffer) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
On low level, all funs that can return multiple values are defined with DEFUN_MULTIPLE_VALUES and have an extra parameter, a struct mv_context *.
It has to be this way to ensure that only the fun itself, and no called funs, think they’re called in an mv context.
apply, funcall, eval might propagate their mv context to their children?
Might need eval-mv to implement calling a fun in an mv context. Maybe also funcall_mv? apply_mv?
Generally, just set up context appropriately. Call fun (noticing whether it’s an mv-aware fun) and binding values on the way back or passing them out. (e.g. to multiple-value-bind)
The multiple return values from get-specifier should allow the specifier value to be modified in the correct fashion (i.e. should interact correctly with all manner of changes from other callers) using set-specifier. We should check this and see if we need other return values. (how-to-add? inst-list?)
In C, call multiple-values-context to get number of expected values, and multiple-value-set (#, value) to get values other than the first.
(Returns Qno_value, or something, if there are no values.
#### Or should throw? Probably not. #### What happens if a fn returns no values but the caller expects a #### value?
Something like funcall_with_multiple_values()
for setting up the
context.
For efficiency, byte code could notice Ffuncall to m.v. functions and sub in special opcodes during load in processing, if it mattered.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
NOTE: Can do preliminary implementation without Multiple Values - instead create fun specifier-instance - that returns a list (and will be deleted at some point)
put
get
, etc. on vectors to modify properties within
them.
map-modifying-instantiator and its force versions below, so that we could implement in turns.
If it notices that it’s just replacing one instantiator with another, instead of just copy-tree the first one and throw away the other, use copy-over-tree to save lots of garbage when repeatedly called.
ILLEGIBLE: GOTO LOO BUI BUGS LAST PNOTE
It might do this through some sort of special instantiator-reference object. This points to the instantiator, where in the hierarchy the instantiator is etc. When an instantiator gets removed, this gu*ILLEGIBLE* values report not attached. Somehow that gets communicated back to the image instance in the cache. So somehow or other, the image instance in the cache knows who’s using them and so when you go and keep updating the slider value, by simply modifying an instantiator, which efficiently changes the internal structure of this specifier - eventually image instantiate notices that the image instance it points has no other user and just modifiers it, but in complex situations, some optimizations get lost, but everything is still correct.
vs.
Andy’s set-image-instance-property, which achieves the same optimizations much more easily, but
’Fallback should be a locale/domain.
(get-specifier specifier &optional locale) #### If locale is omitted, should it be (current-buffer) or 'global? #### Should argument not be optional? |
If a buffer is specified: find a window showing buffer by looking
If none, use buffer -> sel from -> etc.
Returns multiple values second is instantiator third is locale containing inst. fourth is tag set (restart-specifier-instance ...) |
like specifier-instance, but allows restarting the lookup, for implementing inheritance, etc. Obsoletes specifier-matching-find-charset, or whatever it is. The restart argument is opaque, and is returned as a multiple value of restart-specifier-instance. (It’s actually an integer with the low bits holding the locale and the other bits count int to the list) attached to the locale.)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
#### It would also be really nice if you could specify that the
characters come out in hex instead of in octal. Mule does that by
adding a ctl-hexa
variable similar to ctl-arrow
, but
that’s bogus – we need a more general solution. I think you need to
extend the concept of display tables into a more general conversion
mechanism. Ideally you could specify a Lisp function that converts
characters, but this violates the Second Golden Rule and besides would
make things way way way way slow.
So instead, we extend the display-table concept, which was historically limited to 256-byte vectors, to one of the following:
The fourth option allows you to specify multiple display tables instead
of just one. Each display table can specify conversions for some
characters and leave others unchanged. The way the character gets
displayed is determined by the first display table with a binding for
that character. This way, you could call a function
enable-hex-display
that adds a hex display-table to the list of
display tables for the current buffer.
#### ...not yet implemented... Also, we extend the concept of "mapping" to include a printf-like spec. Thus you can make all extended characters show up as hex with a display table like this:
#s(range-table data ((256 524288) (format "%x"))) |
Since more than one display table is possible, you have great flexibility in mapping ranges of characters.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
Abstract: This page describes many optimizations that can be made to the existing Elisp function call mechanism without too much effort. The most important optimizations can probably be implemented with only a day or two of work. I think it’s important to do this work regardless of whether we eventually decide to replace the Lisp engine.
Many complaints have been made about the speed of Elisp, and in
particular about the slowness in executing function calls, and rightly
so. If you look at the implementation of the funcall
function,
you’ll notice that it does an incredible amount of work. Now logically,
it doesn’t need to be so. Let’s look first from the theoretical
standpoint at what absolutely needs to be done to call a Lisp function.
First, let’s look at the situation that would exist if we were smart enough to have made lexical scoping be the default language policy. We know at compile time exactly which code can reference the variables that are the formal parameters for the function being called (specifically, only the code that is part of that function’s definition) and where these references are. As a result, we can simply push all the values of the variables onto a stack, and convert all the variable references in the function definition into stack references. Therefore, binding lexically-scoped parameters in preparation for a function call involves nothing more than pushing the values of the parameters onto a stack and then setting a new value for the frame pointer, at the same time remembering the old one. Because the byte-code interpreter has a stack-based architecture, however, the parameter values have already been pushed onto the stack at the time of the function call invocation. Therefore, binding the variables involves doing nothing at all, other than dealing with the frame pointer.
With dynamic scoping, the situation is somewhat more complicated.
Because the parameters can be referenced anywhere, and these references
cannot be located at compile time, their values have to be stored into a
global table that maps the name of the parameter to its current value.
In Elisp, this table is called the obarray. Variable binding in
Elisp is done using the C function specbind()
. (This stands for
"special variable binding" where special is the standard Lisp
terminology for a dynamically-scoped variable.) What specbind()
does, essentially, is retrieve the old value of the variable out of the
obarray, remember the value by pushing it, along with the name of the
variable, onto what’s called the specpdl stack, and then store the
new value into the obarray. The term "specpdl" means Special
Variable Pushdown List, where Pushdown List is an archaic computer
science term for a stack that used to be popular at MIT. These binding
operations, however, should still not take very much time because of the
use of symbols, i.e. because the location in the obarray where the
variable’s value is stored has already been determined (specifically, it
was determined at the time that the byte code was loaded and the symbol
created), so no expensive hash table lookups need to be performed.
An actual function invocation in Elisp does a great deal more work, however, than was just outlined above. Let’s just take a look at what happens when one byte-compiled function invokes another byte-compiled function, checking for places where unnecessary work is being done and determining how to optimize these places.
&optional
and &rest
keywords.
This list has to be parsed for every function invocation, which
means that for every element in a list, the element is checked to see
whether it’s the &optional
or &rest
keywords, its
surrounding cons cell is checked to make sure that it is indeed a cons
cell, the QUIT
macro is called, etc. What should be happening
here is that the argument list is parsed exactly once, at the time that
the byte code is loaded, and converted into a C array. The C array
should be stored as part of the byte-code object. The C array should
also contain, in addition to the symbols themselves, the number of
required and optional arguments. At function call time, the C array can
be very quickly retrieved and processed.
specbind()
function
is called. This actually does quite a lot of things, including:
symbol_value_buffer_local_info()
to retrieve buffer local
information for the symbol, and then processing the return value from
this function in a series of if statements.
Fset()
to change the variable’s value.
The entire series of calls to specbind()
should be inline and
merged into the argument processing code as a single tight loop, with no
function calls in the vast majority of cases. The specbind()
logic should be streamlined as follows:
specbind()
to do the work.
Fset()
that
checks to make sure a constant isn’t being set. These checks should be
made at the time that the byte code for the function is loaded and the C
array of parameters to the function is created. (Whether a symbol is
constant or not is generally known at XEmacs compile time. The only
issue here is with symbols whose names begin with a colon. These
symbols should simply be disallowed completely as parameter names.)
Other optimizations that could be done are:
byte-code
), the string
containing the actual byte code is converted into an array of integers.
I added this code specifically for MULE so that the byte-code engine
didn’t have to deal with the complexities of the internal string format
for text. This conversion, however, is generally useful because on
modern processors accessing 32-bit values out of an array is
significantly faster than accessing unaligned 8-bit values. This
conversion takes time, though, and should be done once at load time
rather than each time the byte code is executed. This array should be
stored in the byte-code object. Currently, this is a bit tricky to do,
because byte-code
is not actually passed the byte-code object,
but rather three of its elements. We can’t just change byte-code
so that it is directly passed the byte-code object because this
function, with its existing argument calling pattern, is called directly
from compiled Elisp files. What we can and should do, however, is
create a subfunction that does take a byte-code object and actually
implements the byte-code interpreter engine. Whenever the C code wants
to execute byte code, it calls this subfunction. byte-code
itself also calls this subfunction after conjuring up an appropriate
byte-code object and storing its arguments into this object. With a
small amount of work, it’s possible to do this conjuring in such a way
that it doesn’t generate any garbage.
unbind_to()
. Just as for a specbind()
, this function does
a lot of work that is unnecessary in the vast majority of cases, and it
could also be inlined and streamlined.
QUIT
macro, which essentially involves
checking a global volatile variable to see whether additional processing
needs to be done.
debug_on_next_call
.
funcall_recording_as()
.) There is a little bit of code
between each of the checks. This code would simply have to be
duplicated between the two cases where this general trip variable is
true and is false. (Note: the optimization detailed in this item is
probably not worth doing on the first pass.)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
43.23.1 Future Work – Lisp Engine Discussion | ||
43.23.2 Future Work – Lisp Engine Replacement – Implementation | ||
43.23.3 Future Work – Startup File Modification by Packages |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
Abstract: Recently there has been a great deal of talk on the XEmacs mailing lists about potential changes to the XEmacs Lisp engine. Usually the discussion has centered around the question which is better, Common Lisp or Scheme? This is certainly an interesting debate topic, but it didn’t seem to have much practical relevance to me, so I vowed to stay out of the discussion. Recently, however, it seems that people are losing sight of the broader picture. For example, nobody seems to be asking the question, “"Would an extension language other than Lisp or Scheme (perhaps not a Lisp variant at all) be more appropriate?"” Nor does anybody seem to be addressing what I consider to be the most fundamental question, is changing the extension language a good thing to do?
I think it would be a mistake at this point in XEmacs development to begin any project involving fundamental changes to the Lisp engine or to the XEmacs Lisp language itself. It would take a huge amount of effort to complete even part of this project, and would be a major drain on the already-insufficient resources of the XEmacs development community. Most of the gains that are purported to stem from a project such as this could be obtained with far less effort by making more incremental changes to the XEmacs core. I think it would be an even bigger mistake to change the actual XEmacs extension language (as opposed to just changing the Lisp engine, making few, if any, externally visible changes). The only language change that I could possibly imagine justifying would involve switching to some ubiquitous web language, such as Java and JavaScript, or Perl. (Even among those, I think Java would be the only possibility that really makes sense).
In the rest of this document I’ll present the broader issues that would be involved in changing the Lisp engine or extension language. This should make clear why I’ve come to believe as I do.
There seems to be a great deal of confusion concerning the difference between interface and implementation. In the context of XEmacs, changing the interface means switching to a different extension language such as Common Lisp, Scheme, Java, etc. Changing the implementation means using a different Lisp engine. There is obviously some relation between these two issues, but there is no particular requirement that one be changed if the other is changed. It is quite possible, for example, to imagine taking the underlying engine for any of the various Lisp dialects in existence, and adapting it so that it implements the same Elisp extension language that currently exists. The vast majority of the purported benefits that we would get from changing the extension language could just as easily be obtained while making minimal changes to the external Elisp interface. This way nearly all existing Elisp programs would continue to work, there would be no need to translate Elisp programs into some other language or to simultaneously support two incompatible Lisp variants, and there would be no need for users or package authors to learn a new extension language that would be just as unfamiliar to the vast majority of them as Elisp is.
Let’s go over the possible reasons for changing the Lisp engine.
Changing the Lisp engine might make XEmacs faster. However, consider the following.
A new Lisp engine with a better garbage collection mechanism might make more efficient use of memory; for example, through the use of a relocating garbage collector. However, consider this:
A new Lisp engine might well be more robust. (On the other hand, it might not be. It is not always easy to tell). However, I think that the biggest problems with robustness are in the part of the C code that is not concerned with implementing the Lisp engine. The redisplay mechanism and the unexec mechanism are probably the biggest sources of robustness problems. I think the biggest robustness problems that are related to the Lisp engine concern the use of GCPRO declarations. The entire GCPRO mechanism is ill-conceived and unsafe. The only real way to make this safe would be to do conservative garbage collection over the C stack and to eliminate the GCPRO declarations entirely. But how many of the Lisp engines that are being considered have such a mechanism built into them?
A new Lisp engine might well improve the maintainability of XEmacs by offloading the maintenance of the Lisp engine. However, we need to make very sure that this is, in fact, the case before embarking on a project like this. We would almost certainly have to make significant modifications to any Lisp engine that we choose to integrate, and without the active and committed support and cooperation of the developers of that Lisp engine, the maintainability problem would actually get worse.
A new Lisp engine might have built in support for various features that we would like to add to the XEmacs extension language, such as lexical scoping and an object system.
Possible reasons for changing the extension language include:
Switching to a language that is more standard and more commonly in use would be beneficial for various reasons. First of all, the language that is more commonly used and more familiar would make it easier for users to write their own extensions and in general, increase the acceptance of XEmacs. Also, an accepted standard probably has had a lot more thought put into it than any language interface created by the XEmacs developers themselves. Furthermore, if our extension language is being actively developed and supported, much of the work that we would otherwise have to do ourselves is transferred elsewhere.
However, both Scheme and Common Lisp flunk the familiarity test. Neither language is being actively used for program development outside of small research communities, and few prospective authors of XEmacs extensions will be familiar with any Lisp variant for real world uses. (I consider the argument that Scheme is often used in introductory programming courses to be irrelevant. Many existing programmers were taught Pascal in their introductory programming courses. How many of them would actually be comfortable writing a program in Pascal?) Furthermore, someone who wants to learn Lisp can’t exactly go to their neighborhood bookstore and pick up a book on this topic.
There are endless arguments about which language is easiest to use. In practice, this largely boils down to which languages are most familiar.
The object-oriented paradigm is the dominant one in use today for new languages. User interface concepts in particular are expressed very naturally in an object-oriented system. However, neither Scheme nor Common Lisp has been designed with object orientation in mind. There is a standard object system for Common Lisp, but it is extremely complex and difficult to understand.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
Let’s take a look at the sort of work that would be required if we were to replace the existing Elisp engine in XEmacs with some other engine, for example, the Clisp engine. I’m assuming here, of course, that we are not going to be changing the interface here at the same time, which is to say that we will be keeping the same Elisp language that we currently have as the extension language for XEmacs, except perhaps for incremental changes that we will make, such as lexical scoping and proper structure support in an attempt to gradually move the language towards an upwardly-compatible goal, such as Common Lisp. I am writing this page primarily as food for thought. I feel fairly strongly that actually doing this work would be a big waste of effort that would inevitably become a huge time sink on the part of nearly everyone involved in XEmacs development, and not only for the ones who were supposed to be actually doing the engine change. I feel that most of the desired changes that we want for the language and/or the engine can be achieved with much less effort and time through incremental changes to the existing code base.
First of all, in order to make a successful Lisp engine change in XEmacs, it is vitally important that the work be done through a series of incremental stages where at the end of each stage XEmacs can be compiled and run, and it works. It is tempting to try to make the change all at once, but this would be disastrous. If the resulting product worked at all, it would inevitably contain a huge number of subtle and extremely difficult to track down bugs, and it would be next to impossible to determine which of the myriad changes made introduced the bug.
Now let’s look at what the possible stages of implementation could be.
The first step would be to introduce another preprocessing stage for the
XEmacs C code, which is done before the C compiler itself is invoked on
the code, and before the standard C preprocessor runs. The C
preprocessor is simply not powerful enough to do many of the things we
would like to do in the C code. The existing results of this have been
a combination of a lot of hacked up and tricky-to-maintain stuff (such
as the DEFUN
macro, and the associated DEFSUBR
), as well
as code constructs that are difficult to write. (Consider for example,
attempting to do structured exception handling, such as catch/throw and
unwind-protect constructs), as well as code that is potentially or
actually unsafe (such as the uses of alloca
), which could easily
cause stack overflow with large amounts of memory allocated in this
fashion.) The problem is that the C preprocessor does not allow macros
to have the power of an actual language, such as C or Lisp. What our
own preprocessor should do is allow us to define macros, whose
definitions are simply functions written in some language which are
executed at compile time, and whose arguments are the actual argument
for the macro call, as well as an environment which should have a data
structure representation of the C code in the file and allow this
environment to be queried and modified. It can be debated what the
language should be that these extensions are written in. Whatever the
language chosen, it needs to be a very standard language and a language
whose compiler or interpreter is available on all of the platforms that
we could ever possibly consider putting XEmacs to, which is basically to
say all the platforms in existence. One obvious choice is C, because
there will obviously be a C compiler available, because it is needed to
compile XEmacs itself. Another possibility is Perl, which is already
installed on most systems, and is universally available on all others.
This language has powerful text processing facilities which would
probably make it possible to implement the macro definitions more
quickly and easily; however, this might also encourage bad coding
practices in the macros (often simple text processing is not
appropriate, and more sophisticated parsing or recursive data structure
processing needs to be done instead), and we’d have to make sure that
the nested data structure that comprises the environment could be
represented well in Perl. Elisp would not be a good choice because it
would create a bootstrapping problem. Other possible languages, such as
Python, are not appropriate, because most programmers are unfamiliar
with this language (creating a maintainability problem) and the Python
interpreter would have to be included and compiled as part of the XEmacs
compilation process (another maintainability problem). Java is still
too much in flux to be considered at this point.
The macro facility that we will provide needs to add two features to the
language: the ability to define a macro, and the ability to call a
macro. One good way of doing this would be to make use of special
characters that have no meaning in the C language (or in C++ for that
matter), and thus can never appear in a C file outside of comments and
strings. Two obvious characters are the @ sign and the $ sign. We
could, for example, use @
defined to define new macros, and the
$
sign followed by the macro name to call a macro. (Proponents
of Perl will note that both of these characters have a meaning in Perl.
This should not be a problem, however, because the way that macros are
defined and called inside of another macro should not be through the use
of any special characters which would in effect be extending the macro
language, but through function calls made in the normal way for the
language.)
The program that actually implements this extra preprocessing stage
needs to know a certain amount about how to parse C code. In
particular, it needs to know how to recognize comments, strings,
character constants, and perhaps certain other kinds of C tokens, and
needs to be able to parse C code down to the statement level. (This is
to say it needs to be able to parse function definitions and to separate
out the statements, if
blocks, while
blocks, etc. within
these definitions. It probably doesn’t, however need to parse the
contents of a C expression.) The preprocessing program should work
first by parsing the entire file into a data structure (which may just
contain expressions in the form of literal strings rather than a data
structure representing the parsed expression). This data structure
should become the environment parameter that is passed as an argument to
macros as mentioned above. The implementation of the parsing could and
probably should be done using lex
and yacc
. One good idea
is simply to steal some of the lex
and yacc
code that is
part of GCC.
Here are some possibilities that could be implemented as part of the preprocessing:
DEFUN
macros. These could, for
example, take an argument list in the form of a Lisp argument list
(complete with keyword parameters and other complex features) and
automatically generate the appropriate subr
structure, the
appropriate C function definition header, and the appropriate call to
the DEFSUBR
initialization function.
alloca
function. This could allocate the memory in any fashion it chooses
(calling malloc
using a large global array, or a series of such
arrays, etc.) an insert
in the appropriate places to
automatically free up this memory. (Appropriate places here would be at
the end of the function and before any return statements. Non-local
exits can be handled in the function that actually implements the
non-local exit.)
DEFUN
macro
define a new macro for use when calling a primitive.
The goal of this stage is to gradually build up a self-contained Lisp engine out of the existing XEmacs core, which has no dependencies on any of the code elsewhere in the XEmacs core, and has a well-defined and black box-style interface. (This is to say that the rest of the C code should not be able to access the implementation of the Lisp engine, and should make as few assumptions as possible about how this implementation works). The Lisp engine could, and probably should, be built up as a separate library which can be compiled on its own without any of the rest of the XEmacs C code, and can be tested in this configuration as well.
The creation of this engine library should be done as a series of
subsets, each of which moves more code out of the XEmacs core and into
the engine library, and XEmacs should be compilable and runnable between
each sub-step. One possible series of sub-steps would be to first
create an engine that does only object allocation and garbage
collection, then as a second sub-step, move in the code that handles
symbols, symbol values, and simple binding, and then finally move in the
code that handles control structures, function calling, byte-code
execution, exception handling, etc. (It might well be possible to
further separate this last sub-step).
Currently, the XEmacs C code makes all sorts of assumptions about the implementation of the Lisp engine, particularly in the areas of object allocation, object representation, and garbage collection. A different Lisp engine may well have different ways of doing these implementations, and thus the XEmacs C code must be rid of any assumptions about these implementations. This is a tough and tedious job, but it needs to be done. Here are some examples:
GCPRO
must go. The GCPRO
mechanism is tedious,
error-prone, unmaintainable, and fundamentally unsafe. As anyone who
has worked on the C Core of XEmacs knows, figuring out where to insert
the GCPRO
calls is an exercise in black magic, and debugging
crashes as a result of incorrect GCPROing
is an absolute
nightmare. Furthermore, the entire mechanism is fundamentally unsafe.
Even if we were to use the extra preprocessing stage detailed above to
automatically generate GCPRO
and UNGCPRO
calls for all
Lisp object variables occurring anywhere in the C code, there are still
places where we could be bitten. Consider, for example, code which
calls cons
and where the two arguments to this functions are both
calls to the append
function. Now the append
function
generates new Lisp objects, and it also calls QUIT
, which could
potentially execute arbitrary Lisp code and cause a garbage collection
before returning control to the append
function. Now in order to
generate the arguments to the cons
function, the append
function is called twice in a row. When the first append
call
returns, new Lisp data has been created, but has no GCPRO
pointers to it. If the second append
call causes a garbage
collection, the Lisp data from the first append
call will be
collected and recycled, which is likely to lead to obscure and
impossible-to-debug crashes. The only way around this would be to
rewrite all function calls whose parameters are Lisp objects in terms of
temporary variables, so that no such function calls ever contain other
function calls as arguments. This would not only be annoying to
implement, even in a smart preprocessor, but would make the C code
become incredibly slow because of all the constant updating of the
GCPRO
lists.
GCPRO
mechanism and simply do conservative garbage collection
over the C stack. There are already portable implementations of
conservative pointer marking over the C stack, and these could easily be
adapted for use in the Elisp garbage collector. If, as outlined above,
we use an extra preprocessing stage to create a new version of
alloca
that allocates its memory elsewhere than actually on the C
stack, and we ensure that we don’t declare any large arrays as local
variables, but instead use alloca
, then we can be guaranteed that
the C stack is small and thus that the conservative pointer marking
stage will be fast and not very likely to find false matches.
GCPRO
declarations as just outlined would also
remove the assumption currently made that garbage collection can occur
only in certain places in the C code, rather than in any arbitrary spot.
(For example, any time an allocation of Lisp data happens). In order to
make things really safe, however, we also have to remove another
assumption as detailed in the following item.
Lisp_Object
of type buffer and a C pointer to a
struct buffer
mean basically the same thing, and indiscriminately
passes the two kinds of buffer pointers around. With relocatable Lisp
objects, the pointers to the C structures might change at any time.
(Remember, we are now assuming that a garbage collection can happen at
basically any point). All of the C code needs to be changed so that
Lisp objects are always passed around using a Lisp object type, and the
underlying pointers are only retrieved at the time when a particular
data element out of the structure is needed. (As an aside, here’s
another reason why Lisp objects, instead of pointers, should always be
passed around. If pointers are passed around, it’s conceivable that at
the time a garbage collection occurs, the only reference to a Lisp
object (for example, a deleted buffer) would be in the form of a C
pointer rather than a Lisp object. In such a case, the conservative
pointer marking mechanism might not notice the reference, especially if,
in an attempt to eliminate false matches and make the code generally
more efficient, it will be written so that it will look for actual Lisp
object references.)
while
block.
You’d write the word lock
followed by a parenthesized expression
that retrieves the C pointer and stores it into a variable that is
scoped only within the lock block and followed in turn by some code in
braces, which is the actual code associated with the lock block, and
which can make use of this pointer. While the code inside the lock
block is executing, that particular pointer and the object pointed to by
it is guaranteed not to be relocated.
Once we’ve done all of the work mentioned in the previous steps (and admittedly, this is quite a lot of work), we should have an XEmacs that still uses what is essentially the old and previously existing Lisp engine, but which is ready to have its Lisp engine replaced. The replacement might proceed as follows:
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Author: Ben Wing
OK, we need to create a design document for all of this, including:
PRINCIPLE #1: Whenever you have auto-generated stuff, CLEARLY indicate this in comments around the stuff. These comments get searched for, and used to locate the existing generated stuff to replace. Custom currently doesn’t do this.
PRINCIPLE #2: Currently, lots of functions want to add code to the .emacs. (e.g. I get prompted for my mail address from add-change-log-entry, and then prompted if I want to make this permanent). There needs to be a Lisp API for working with arbitrary code to be added to a user’s startup. This API hides all the details of which file to put the fragment in, where in it, how to mark it with magical comments of the right kind so that previous fragments can be replaced, etc.
PRINCIPLE #3: ALL generated stuff should be loaded before any
user-written init stuff. This way the user can override the generated
settings. Although in the case of customize, it may work when the
custom stuff is at the end of the init file, it surely won’t work for
arbitrary code fragments (which typically do setq
or the like).
PRINCIPLE #4: As much as possible, generated stuff should be place in separate files from non-generated stuff. Otherwise it’s inevitable that some corruption is going to result.
PRINCIPLE #5: Packages are encouraged, as much as possible, to work within the customize model and store all their customizations there. However, if they really need to have their own init files, these files should be placed in .xemacs/, given normal names (e.g. ‘saved-abbrevs.el’ not .abbrevs), and there should be some magic comment at the top of the file that causes it to get automatically loaded while loading a user’s init file. (Alternatively, the above-named API could specify a function that lets a package specify that they want such-and-such file loaded from the init file, and have the specifics of this get handled correctly.)
OVERARCHING GOAL: The overarching goal is to provide a unified mechanism for packages to store state and setting information about the user and what they were doing when XEmacs exited, so that the same or a similar environment can be automatically set up the next time. In general, we are working more and more towards being a truly GUI app where users’ settings are easy to change and get remembered correctly and consistently from one session to the next, rather than requiring nasty hacking in elisp.
Hrvoje, do you have any interest in this? How about you, Martin? This seems like it might be up your alley. This stuff has been ad-hocked since kingdom come, and it’s high time that we make this work properly so that it could be relied upon, and a lot of things could "just work".
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section was written by Stephen Turnbull <stephen@xemacs.org>, so don’t blame Ben (or Eric and Matthias, for that matter). Feel free to add, edit, and share the blame, guys!
As of late November 2004, this principally means adding support for the ‘Xft’ library, which provides a more robust font configuration mechanism via Keith Packard’s ‘fontconfig’ library improved glyph rendering, including antialiasing, via the ‘freetype’ library, and client-side rendering (saving bandwidth and server memory) via the ‘XRender extension’. In fact, patches which provide Xft support have been available for several years, but the authors have been unwilling to deal with several important issues which block integration. These are Mule, and more generally, face support; widget support (including the toolbar and menubar); and redisplay refactoring.
However, in late 2003 Eric Knauel <knauel@informatik.uni-tuebingen.de> and Matthias Neubauer <neubauer@informatik.uni-freiburg.de> put forward a relatively complete patch which was robust to daily use in ISO 8859-1 locales, and Stephen Turnbull began work on the integration issues. At this point a (private) CVS branch is available for Stephen’s patch (branch point tag ‘sjt-xft-bp’, branch tag ‘sjt-xft’), and one may be made available for the Knauel-Matthias patch soon.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Of course it’s “unfair” to demand that the implementers of a nice feature like anti-aliasing support deal with accumulated cruft of the last few years, but somebody must, sometime soon. Even core developers are complaining about how slow XEmacs is in some applications, and there is reason to believe that some of the problem is in redisplay. Adding more ad hoc features to redisplay will make the whole module more complex and unintelligible. Even if it doesn’t inherently further detract from efficiency, it will surely make reform and refactoring harder.
Similar considerations apply to Mule support. If Xft support is not carefully designed, or implemented with Mule support soon, it will undoubtedly make later Mule implementation far more difficult than it needs to be, and require redundant work be done (e.g., on ‘Options’ menu support).
Besides the design issue—and many users are requesting more flexibility, primarily face support, from the widgets—with widget support there is also an aesthetic issue. It is horribly unimpressive to have clunky bitmapped fonts on the decorations when pleasant antialiased fonts are available in the buffer.
Finally, these issues interact. Widgets and faces are inherently heavyweight objects, requiring orders of magnitude more computation than simply displaying a string in a fixed font. This will have an efficiency impact, of course. And they interact with each other; Mule was designed for use in buffers and display in Emacs windows—but a widget’s content is usually not a buffer, and widgets need not be displayed in a window, but may appear in other contexts, especially in the gutters. So specifiers will probably have to be reworked, in order to properly support display of different faces in non-buffer, non-window contexts.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Stephen is thinking in terms of the following components of a comprehensive proposal.
In XEmacs, font configuration is handled via faces. Currently XEmacs uses a special type of font specifier to map XEmacs locales to font names. Especially under X11, this can cause annoying problems because of the unreliability of X servers’ mappings from ‘XLFD’ names to X11 fonts, over which XEmacs has no influence whatsoever. However, the ‘fontconfig’ library which is used with ‘Xft’ provides much more reliable mapping, along with a more reliably parsable naming scheme similar to that used by TrueType fonts on MS Windows and the Macintosh. Since the capabilities of font specifiers and ‘fontconfig’ overlap, we should consider using ‘fontconfig’ instead of ‘XLFD’ names. This implies that use of ‘Xft’’s rendering functionality should be separated from use of ‘fontconfig’.
Fontconfig is dramatically different from the X model in several ways. In particular, for the convenient interface fontconfig always returns a font. However, the font returned need not be anything like the desired font. This means that XEmacs must adopt a strategy of delegating the search to fontconfig, then sanity-checking the return, rather than trying to use the fontconfig API to search using techniques appropriate for the X11 core font API. (This isn’t actually true. fontconfig has more complex interfaces which allow listing a subset of fonts that match a pattern, and don’t go out of their may to return something no matter what. But the original patches didn’t use this approach.)
The ‘Options->Font’ and ‘Options->Font Sizes’ menus are broken, by design, not just by ‘Xft’. Although they work better in Eric and Matthias’s patch than in Stephen’s, even their version has the problem that many fonts are unavailable because they don’t match the current size—which is very strange, since ‘Xft’ fonts are of course scalable. But the whole idea of requiring that the font match the size is strange. And the ‘Options->Font Weights’ menu is just disabled, and has been for eons.
Currently in Stephen’s patch there are five treatments of font resources. There are the ‘XEmacs.face.attributeFont’ resources used to set a single global font specification. In the widgets, some (still) have a ‘font’ resource using the automatic ‘Xt’ resource conversion to ‘FontStruct’, some have separate ‘font’ and ‘fcFontName’ resources with the former automatically converted to ‘FontStruct’ by ‘Xt’ and the latter left as a string, to be converted by ‘FcParseName’ later, and some have a single ‘font’ resource which is converted to ‘FontStruct’ by ‘Xt’ or the latter left as a string, depending on whether ‘Xft’ was enabled by ‘configure’ or not. There is also the ‘xftFont’ resource which may be retargeted to use an Xt converter function, but currently simply just an alias for the ‘fcFontName’ resource.
Stephen thinks that all of these should be converted to use the face approach, perhaps with some way to set specifications for individual widgets, frames, or buffers. This will require some careful design work to incorporate face support in the widgets. We should just accept any or all of ‘font’, ‘fontSet’, and ‘fontList’ resources, treat them all as lists of font names, either ‘XLFD’- or ‘fontconfig’-style, parse them ourselves (ie, not use the ‘Xt’ resource manager), and add them to font specifiers as appropriate. But this will require a bit of thought to obey POLA vis-a-vis usual ‘Xt’ conventions.
With the introduction of the “Xft patch,” the X11, Macintosh, and MS Windows platforms are all able to support multiple font rendering engines in the same binary. Generically, there are several tasks that must be accomplished to render text on the display. In both cases the code is rather disorganized, with substantial cross-platform duplication of similar routines. While it may not be worthwhile to go the whole way to ‘RENDERER_HAS_METHOD’ and ‘MAYBE_RENDMETH’, refactoring these modules around the notion of interfacing a “generic rendering engine interface” to “text” seems like a plausible way to focus this work.
Further evidence for this kind of approach is a bug recently fixed in the ‘xft-sjt’ branch. XEmacs was crashing because the Athena Label widget tried to access a nonexistent font in its initialization routine. The font didn’t exist because although no core X11 font corresponding to the spec existed, an Xft font was found. So the XEmacs font instance existed but it did not specify an X11 core font, only the Xft font. When this object was used to initialize the font for the Label widget, None (0) was passed to XtSetArgs, then XtCreateWidget was called, and the internal initialization routine attempted to access that (nonexistent) font while computing an X11 graphics context (GC).
A similar issue applies to colors, but there Xft colors keep the pixel data internally, so (serendipitously) the X11 color (i.e., pixel) member does get updated.
Besides the rendering engine itself, the XEmacs implementations of these objects are poorly supported by current widget implementations, including the traditional menubar and toolbar, as well as the more recent button, tab control, and progress bar widgets. The refactoring suggested under “Rendering engine objects” should be conducted with an eye to making these widgets support faces, perhaps even to the extent of allowing rendering to X pixmaps (which some Athena widgets support, although they will not support rendering via Xft directly). Especially with ‘XRender’ technology this should not be horribly inefficient.
Traditionally Mule uses a rather rigid and low-level abstraction, the charset, to characterize font repertoires. Unfortunately, support for a given charset is generally neither necessary nor sufficient to support a language. Worse, although X11’s only means for indicating font repertoires is the font’s registry, the actual repertoire of many fonts is either deficient or font-dependent. The only convenience is that the registry maps directly to a Mule charset in most cases, and vice versa.
To date, XEmacs Mule has supported identification of appropriate fonts to support a language’s repertoire of characters by identifying the repertoire as a subset of a union of charsets. To each charset there is a regular expression matching the registry portion of a font name. Then instantiation of a font proceeds by identifying the specifier domain, and then walking down the list of specifications, matching the regexp against font names until a match is found. That font is requested from the system, and if not found, the process continues similarly until a font that can be loaded is found.
This has several problems. First, there’s no guarantee that the union will be disjoint. This problem manifests both in the case of display of Unicode representations of text in the ‘POSIX’ default locale, where glyphs are typically drawn from several inappropriate fonts. A similar problem often occurs, though for a different reason, in multilingual messages composed using ‘Gnus’’s ‘message-mode’ and MIME support. This problem cannot be avoided with the current design; it is quite possible that a font desired in one context will be shadowed by a font intended to get higher priority in a semantically different but syntactically similar (as far as Mule can tell) context. (Of course, one could attach a different face as a text property, but that requires programming support; it can’t be done by user configuration.) The problem is only exacerbated as more and more Unicode fonts, supporting large repertoires with substantial overlap across fonts, are designed and published.
A second problem is that registry names are often inaccurate. For example, the Japanese JIS X 0208 standard was first published in 1978 (as a relabelling of an older standard). It was then revised in 1983, again in 1990, and once again in 2000, with slight changes to the repertoire and mapping in each revision. Technically, these standards can be distinguished in properly named fonts as ‘jisx0208.1978’, ‘jisx0208.1983’, ‘jisx0208.1990’, ‘jisx0208.2000’, but all of them are commonly simply labelled ‘jisx0208’, and Western distributors, of course, generally lack the expertise to correctly relabel them.
A third problem is that you generally can’t tell if there are “holes” in the repertoire until you try to display the glyph.
All of this tends to break standard idioms for handling Mule fonts in ‘init’ files because they depend on charsets being disjoint repertoires.
The TrueType fonts (and the later OpenType standard) provides for a proper character set query (as a Boolean vector indexed by Unicode code points), as well as providing a list of supported languages.
I propose that we take advantage of these latter facilities by allowing a font to be specified either as a string (a font name), or as a list whose head is the font name and whose tail is a list of languages and Mule charsets (for backward compatibility) that user intends to use the font to display. This will probably require a change to the specifier code.
As mentioned above, specifiers will probably also have to be enhanced to recognize ‘widget’ locales and domains, instead of the current hack where special ‘widget’ and ‘gui-element’ faces are created.
Customize needs to deal with all this stuff!!
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Stephen has a branch containing his stuff in XEmacs CVS. The branch point tag is ‘sjt-xft-bp’, roughly corresponding to XEmacs 21.5.18, and branch tag is ‘sjt-xft’.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
ChangeLogs
A lot of these, especially for Eric and Matthias’s work, are missing. Mea culpa.
Options->Font
Options->Font Size
These menus don’t work. All fonts are greyed out. All sizes are available, but many (most?) faces don’t change size, in particular, ‘default’ does not.
Antialiased text bleeding outside of reported extent
On my PowerBook G4 Titanium 15" screen, X.org server v6.8.1,
dimensions: 1280x833 pixels (433x282 millimeters),
resolution: 75x75 dots per inch,
depth of root window: 24 planes
(yes, those dimensions are broken),
with font "Bitstream Vera Sans Mono-16:dpi=75" antialiased text may
bleed out of the extent reported by XftTextExtents and other such
facilities. This is most obvious with the underscore character in that
font. The bottom of the underscore is antialiased, and insertions or
deletions in the same line before the underscore leave a series of
"phantom" underlines. Except that it doesn’t happen on the very first
such insertion or deletion after a window refresh. A similar effect
sometimes occurs with deletions at the end of the line (no, I can’t
define "sometimes"). See also comments in ‘redisplay-x.c’,
functions x_output_string
and x_output_display_block
.
(Mostly duplicated here.)
I think this is probably an Xft bug, but I’m not sure.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
For Stephen’s ‘sjt-xft’ branch, you should keep the following in mind when configuring:
set-face-font
(and other specifier-changing
functions).
There currently is no explicit way to specify that a particular font be used only for a given language. However, since many fonts support only a limited repertoire such as ISO 8859/1, you can use the precedence of specifications for a given specifier locale to get something of this effect for non-Latin character sets. This will normally work rather poorly for multiple Latin character sets, however, because the repertoires tend to have large amounts of overlap. Support for specifying font by language as well as by character set is planned.
Because fonts supporting other languages tend to support English as
well, if you want to use one font for English and another for the other
language, you must use the append
method when adding font
specifications for the other language.
However, this leaves you with a problem if you want to change the other language’s font: you have to remove the existing specification so it won’t shadow the new one when you append.
I use define-specifier-tag
like this:
(define-specifier-tag 'lang-ja) ;; No, I don't try to do real work with this font! But it makes it ;; obvious that I got the requested font. :-) (set-face-font 'default "AirCut-14") (set-face-font 'default "Kochi Mincho-14" nil '(lang-ja) 'append) ;; Oops, too sober. Try something to match AirCut. (set-face-font 'default "Mikachan-14" nil '(lang-ja) 'remove-tag-set-append) |
Here are the resources I use. Warning: This interface will change. The tab control and menubar have separate Font and XftFont resources, and use the X resource manager to instantiate a FontStruct from the Font resource. There is no converter facility for XftFont yet, and creating one that handles both FontStruct and XftFont depending on XEmacs’s configuration and the font name seems error-prone at best. Probably we will should to a simple string representation for this resource, and convert to a face in XEmacs rather than a font in Xt/Xft.
! DEPRECATED resource xftFont. ! To be retargeted to an Xt converter which returns a font. !XEmacs*Tabs.xftFont: Bitstream Vera Sans-16 !XEmacs*menubar*xftFont: Bitstream Vera Sans-16 XEmacs*Tabs.fcFontName: Bitstream Vera Sans-16 XEmacs*menubar*fcFontName: Bitstream Vera Sans-16 XEmacs.modeline.attributeFont: Bitstream Charter-16 XEmacs.default.attributeFont: Bitstream Vera Sans Mono-16 |
I highly recommend use of a proportional font in the modeline because it allows a lot more text to fit there. (Previously the font sizes were quite varied, and there was a comment that this weirdness gave good balance. This isn’t true on my main platform, Mac OS X, and needs to be rechecked on Linux, where it was observed.) Note that you can probably specify a particular Japanese font with something like
XEmacs.default.attributeFont: Bitstream Vera Sans Mono,Sazanami Mincho-16 |
Order is important; Japanese fonts will support English, but Sazanami’s Roman characters are not very pretty compared to the Bitstream font. NOTE: This is untested, but should work in theory.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
NB: This subtree eventually needs to be moved to the Lispref.
This chapter describes integration of the ‘Xft’ font support library into XEmacs. This library is a layer over the separate ‘FreeType’ rendering engine and ‘fontconfig’ font query and selection libraries. ‘FreeType’ provides rendering facilities for modern, good-looking TrueType fonts with hinting and antialiasing, while ‘fontconfig’ provides a coherent interface to font query and selection which is independent of the rendering engine, although currently it is only used in ‘Xft’ to interface to ‘FreeType’.
From the user’s point of view, ‘fontconfig’ provides a naming convention which is precise, accurate, and convenient. Precision means that all properties available in the programming API can be individually specified. Accuracy means that the truename of the font is exactly the list of all properties specified by the font. Thus, the anomalies that occur with XLFDs on many servers (including modern Linux distributions with XFree86 or X.org servers) cannot occur. Convenience is subjective, of course. However, ‘fontconfig’ provides a configuration system which (1) explicitly specifies the defaults and substitutions that will be made in processing user queries, and (2) allows the user to specify search configuration, abbreviations, substitutions, and defaults that override the system’s, in the same format as used by system files. Further, a standard minimal configuration is defined that ensures that at least serif, sans-serif, and monospace fonts are available on all ‘fontconfig’ systems.
43.24.5.1 Modern Font Support – Font Concepts | GUI devices, fonts, glyphs, rendering. | |
43.24.5.2 Modern Font Support – fontconfig | Querying and selecting fonts. | |
43.24.5.3 Modern Font Support – fontconfig | Rendering fonts on X11. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In modern systems, displays are invariably raster graphic devices, which present an abstract interface of pixel array where each pixel value is a color, and each pixel is individually mutable, and (usually) readable. In XEmacs, such devices are collectively called GUI devices, as opposed to TTY devices which are character stream devices but may support control sequences for setting the color of individual characters, and the insertion position in a rectangular array. Here we are concerned only with control of GUI devices but use TTY devices as a standard for comparison.
A font is an indexed collection of glyphs, which are specifications of character shapes. On a TTY device, these shapes are entirely abstract, and the index is the identity function. Typically fonts are embedded in TTY devices, the user has no control over the font from within the application, and where choice is available, there is limited selection, and no extensibility. Simple, functional, and ... ugly.
On GUI devices, the situation is different in every respect. Glyphs may be provided by the device, the application, or the user. Additional glyphs may be added at will at any of those levels. Arbitrary index functions allow the same glyph to be used to display characters in different languages or using application-specific codes. Glyphs have concrete APIs, allowing fine control of rendering parameters, even user-specified shapes. To provide convenient, consistent handling of collections of glyphs, we need a well-defined font API.
We can separate the necessary properties into two types: properties which are common to all glyphs in the collection or a property of the collection itself, and those which are glyph-specific. Henceforth, the former are called font properties and the latter glyph properties.
Font properties include identification like the font family, font-wide design parameters like slant and weight, font metrics like size (nominal height) and average width used for approximate layout (such as sizing a popup dialog), and properties like the default glyph that are associated with the font for convenient use by APIs, but aren’t really an intrinsic property of the font as a collection of glyphs. There may also be a kerning table (used to improve spacing of adjacent glyphs).
Glyph properties include the index, glyph metrics such as ascent, descent, width, offset (the offset to the normal position of the next glyph), italic correction (used to improve spacing when slanted and unslanted glyphs are juxtaposed). Most important, of course, is the glyph’s shape, which is provided in a format specific to a rendering engine. Common formats include bitmaps (X11 BDF), Postcript programs (Type 1), and collections of spline curves (TrueType). When the shape is not itself a bitmap, it must be rendered to a pixmap, either a region on the display or a separate object which is copied to the display. In that case, the shape may include “multiple masters” or “hints” to allow context-specific rendering which improves the appearance of the glyph on the display.
Note that this use of “glyph” is mostly independent of the XEmacs LISP glyph API. Glyphs. It is possible to extract a single glyph from a font and encapsulate it in Lisp_Glyph object, but the LISP glyph API allows access to only a very few glyph properties, none of them related to the rendering process.
XEmacs LISP does provide an API for selecting and querying fonts, in the
form of a fairly complete set of wrappers for ‘fontconfig’
(see section Modern Font Support – fontconfig). It also provides some
control of rendering of text via wrappers for ‘Xft’ APIs
(see section Modern Font Support – fontconfig), but this API is quite incomplete.
Also, since the font selection and query facilities of ‘Xft’ are
provided by ‘fontconfig’, there is some confusion in the API. For
example, use of antialiasing to improve the appearance of rendered
glyphs can be enabled or disabled. The API for this is to set the
‘fontconfig’ font property antialias
on the font. However,
from the point of view of ‘fontconfig’ this is merely a hint that
the rendering engine may or may not respect. This property cannot be
used to select only fonts suitable for being antialiased, for example.
And rgba
(subpixel geometry) and dpi
(pixel density) are
conceptually properties of the display, not of either the font. They
function as hints to the rendering process.
As a final confusing touch, ‘Xft’ also provides some access to the ‘XRender’ extension provided by some modern X servers. This is mostly limited to colors, but rectangle APIs are also provided. These are (of course) completely independent of fonts, but ‘Xft’ is designed for client-side font rendering, and thus uses the ‘XRender’ extension heavily.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Implementation notes: The functions which initialize the library
and handle memory management (e.g., FcInit
and
FcPatternDestroy
) are intentionally not wrapped (in the latter
case, fc-pattern-destroy
was provided, but this was
ill-considered and will be removed; LISP code should never call
this function). Thinking about some of the auxiliary constructs used by
‘fontconfig’ is in transition. The FcObjectSet
API has been
internalized; it is exposed to LISP as a list of strings. The
FcFontSet
API is still in use, but it also will be internalized,
probably as a list (alternatively, vector) of Lisp_fc_pattern
objects. Changing the representation of ‘fontconfig’ objects
(property names) from LISP strings to keywords is under consideration.
If ‘Xft’ (including ‘fontconfig’) support is integrated into
the XEmacs build, XEmacs provides the symbol xft
at
initialization.
XEmacs provides the following functions wrapping the ‘fontconfig’ library API.
Returns t if object is of type fc-fontset, nil otherwise. This API is likely to be removed in the near future.
Counts the number of fc pattern objects stored in the fc fontset object fcfontset. This API is likely to be removed in the near future.
Return the fc pattern object at index i in fc fontset object fcfontset. Return nil if the index exceeds the bounds of fcfontset. This API is likely to be removed in the near future.
Explicitly deallocate fcfontset. Do not call this function from LISP code. You will crash. This API will be removed in the near future.
Returns t if object is of type fc-pattern, nil otherwise.
Return a fresh and empty fc-pattern object.
Parse string name as a fontconfig font name and return its representation as a fc pattern object.
Unparse pattern object pattern to a string.
‘Xft’’s similar function is actually a different API. We provide both for now. (They probably invoke the same code from ‘fontconfig’ internally, but the ‘fontconfig’ implementation is more conveniently called from C.)
Unparse pattern object pattern to a string (using the ‘Xft’ API).
Make a copy of pattern object pattern and return it.
Add attributes to the pattern object pattern. property is a string naming the attribute to add, value the value for this attribute.
value may be a string, integer, float, or symbol, in which case the value will be added as an FcChar8[], int, double, or FcBool respectively.
Remove attribute property from pattern object pattern.
This is the generic interface to FcPatternGet
.
We don’t support the losing symbol-for-property interface. However, it
might be a very good idea to use keywords for property names in LISP.
From pattern, extract property for the id’th member, of type type.
pattern is an ‘Xft’ (‘fontconfig’) pattern object. property is a string naming a ‘fontconfig’ font property. Optional id is a nonnegative integer indexing the list of values for property stored in pattern, defaulting to 0 (the first value). Optional type is a symbol, one of ’string, ’boolean, ’integer, ’float, ’double, ’matrix, ’charset, or ’void, corresponding to the FcValue types. (’float is an alias for ’double).
Symbols with names of the form ‘fc-result-DESCRIPTION’ are returned when the desired value is not available. These are
fc-result-type-mismatch the value found has an unexpected type fc-result-no-match there is no such attribute fc-result-no-id there is no value for the requested ID |
The Lisp types returned will conform to type:
string string boolean `t' or `nil' integer integer double (float) float matrix not implemented charset not implemented void not implemented |
The types of the following standard properties are predefined by fontconfig. The symbol ’fc-result-type-mismatch will be returned if the object exists but type does not match the predefined type. It is best not to specify a type for predefined properties, as a mistake here ensures error returns on the correct type.
Each standard property has a convenience accessor defined in
‘fontconfig.el’, named in the form
‘fc-pattern-get-property’. The convenience functions are
preferred to fc-pattern-get
since a typo in the string naming a
property will result in a silent null return, while a typo in a function
name will usually result in a compiler or runtime \"not fboundp\" error.
You may use defsubst
to define convenience functions for non-standard
properties.
family String Font family name style String Font style. Overrides weight and slant slant Int Italic, oblique or roman weight Int Light, medium, demibold, bold or black size Double Point size aspect Double Stretches glyphs horizontally before hinting pixelsize Double Pixel size spacing Int Proportional, monospace or charcell foundry String Font foundry name antialias Bool Whether glyphs can be antialiased hinting Bool Whether the rasterizer should use hinting verticallayout Bool Use vertical layout autohint Bool Use autohinter instead of normal hinter globaladvance Bool Use font global advance data file String The filename holding the font index Int The index of the font within the file ftface FT_Face Use the specified FreeType face object rasterizer String Which rasterizer is in use outline Bool Whether the glyphs are outlines scalable Bool Whether glyphs can be scaled scale Double Scale factor for point->pixel conversions dpi Double Target dots per inch rgba Int unknown, rgb, bgr, vrgb, vbgr, none - subpixel geometry minspace Bool Eliminate leading from line spacing charset CharSet Unicode chars encoded by the font lang String List of RFC-3066-style languages this font supports |
The FT_Face, Matrix, CharSet types are unimplemented, so the corresponding
properties are not accessible from Lisp at this time. If the value of a
property returned has type FT_Face, FcCharSet, or FcMatrix,
fc-result-type-mismatch
is returned.
The following properties which were standard in ‘Xft’ v.1 are
obsolete in ‘Xft’ v.2: encoding
, charwidth
,
charheight
, core
, and render
.
Explicitly deallocate pattern object pattern. Do not call this function from LISP code. You will crash. This API will be removed in the near future.
Return the font on device that most closely matches pattern.
pattern is a ‘fontconfig’ pattern object. device is an
X11 device. Returns a ‘fontconfig’ pattern object representing the
closest match to the given pattern, or an error code. Possible error
codes are fc-result-no-match
and fc-result-no-id
.
List the fonts on device that match pattern for properties. device is an X11 device. pattern is a ‘fontconfig’ pattern to be matched. properties is the list of property names (strings) that should be included in each returned pattern. The result is a ‘fontconfig’ fontset object containing the set of unique matching patterns.
The properties argument does not affect the matching. So, for example,
(mapcar #'fc-name-unparse (let ((xfl (fc-list-fonts-pattern-objects nil (fc-name-parse "FreeMono") '("style"))) (i 0) (fl nil)) (while (< i (fc-fontset-count xfl)) (push (fc-fontset-ref xfl i) fl) (setq i (1+ i))) fl)) |
will return something like ‘(":style=Bold" ":style=Medium" ":style=Oblique" ":style=BoldOblique")’ if you have the FreeFont package installed. Note that the sets of objects in the target pattern and the returned patterns don’t even intersect.
In using fc-list-fonts-pattern-objects
, be careful that only
intrinsic properties of fonts be included in the pattern. Those
properties included in the pattern must be matched, or the candidate
font will be eliminated from the list. When a font leaves a property
unspecified, it is considered to be a mismatch for any pattern with that
property specified. Thus, inclusion of extraneous properties will
result in the list being empty. Note that for scalable fonts (at
least), size
is not an intrinsic property! Thus a specification
such as "Bitstream Vera Sans-12"
will return an empty list
regardless of whether the font is available or not—probably not what
you (as programmer or user) want.
The list is unsorted. In particular, the pattern
":style=italic,oblique"
will not return italic fonts first, then
oblique ones. The fonts will be returned in some arbitrary order.
Implementation notes: Fontset objects are slated for removal
from the API. In the future fc-list-fonts-pattern-objects
will
return a list. The device argument is unused, ignored, and may be
removed if it’s not needed to match other font-listing APIs. This name
will be changed to correspond to Ben’s new nomenclature, probably simply
fc-font-list
.
Return a fontset object listing all fonts sorted by proximity to pattern. device is an X11 device. pattern is a fontconfig pattern to be matched. Optional argument trim, if non-nil, means to trim trailing fonts that do not contribute new characters to the union repertoire.
Implementation notes: Fontset objects are slated for removal
from the API. In the future fc-font-sort
will return a list (or
perhaps a vector) of FcPatterns. The device argument is unused,
ignored, and may be removed if it’s not needed to match other
font-listing APIs.
Temporarily open font fontname (a string) on device xdevice and return the actual fc pattern matched by the Fc library. This function doesn’t make much sense and will be removed from the API.
Check whether string fontname is a XLFD font name.
Level of debugging messages to issue to stderr for Xft. A nonnegative integer. Set to 0 to suppress all warnings. Default is 1 to ensure a minimum of debugging output at initialization. Higher levels give more information.
The major version number of the Xft library compiled with.
Regular expression matching XLFD font names.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
IIRC, we don’t really provide any ‘Xft’ APIs at the LISP level yet.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Aidan Kehoe on December 27, 2016 using texi2html 1.82.