XEmacs -- Emacs: The Next Generation
English
German
Japanese
America
Asia
Australia
Europe
 
     Searching XEmacs
Quick Links About XEmacs Getting XEmacs Customizing XEmacs Troubleshooting XEmacs Developing XEmacs
      

Lisp-level encoding stream interface

Ben Wing <ben@xemacs.org>

An lstream interface for use in creating arbitrary lisp coding systems (not just international encodings but gzip, base64, md5, etc.).

Status

Not for inclusion

Specification is mostly complete.

Open bugs

None.

Other open issues

The following specification needs to be implemented.

- Lisp Stream API

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Expose XEmacs internal lstreams to Lisp as stream objects.  (In
addition to the functions given below, each stream object has
properties that can be associated with it using the standard put, get
etc. API.  For GNU Emacs, where put and get have not been extended to
be general property functions, but work only on strings, we would have
to create functions set-stream-property, stream-property,
remove-stream-property, and stream-properties.  These provide the same
functionality as the generic get, put, remprop, and object-plist
functions under XEmacs)

(Implement properties using a hash table, and *generalize* this so
that it is extremely easy to add a property interface onto any kind
of object)

(write-stream STREAM STRING)

Write the STRING to the STREAM.  This will signal an error if all the
bytes cannot be written.

(read-stream STREAM &optional N SEQUENCE)

Reads data from STREAM.  N specifies the number of bytes or
characters, depending on the stream.  SEQUENCE specifies where to
write the data into.  If N is not specified, data is read until end of
file.  If SEQUENCE is not specified, the data is returned as a stream.
If SEQUENCE is specified, the SEQUENCE must be large enough to hold
the data.

(push-stream-marker STREAM)

   returns ID, probably a stream marker object

(pop-stream-marker STREAM)

   backs up stream to last marker

(unread-stream STREAM STRING)

The only valid STREAM is an input stream in which case the data in
STRING is pushed back and will be read ahead of all other data.  In
general, there is no limit to the amount of data that can be unread or
the number of times that unread-stream can be called before another
read.

(stream-available-chars STREAM)

This returns the number of characters (or bytes) that can definitely
be read from the screen without an error.  This can be useful, for
example, when dealing with non-blocking streams when an attempt to
read too much data will result in a blocking error.

(stream-seekable-p STREAM)

Returns true if the stream is seekable.  If false, operations such as
seek-stream and stream-position will signal an error.  However, the
functions set-stream-marker and seek-stream-marker will still succeed
for an input stream.

(stream-position STREAM)

If STREAM is a seekable stream, returns a position which can be passed
to seek-stream.

(seek-stream STREAM N)

If STREAM is a seekable stream, move to the position indicated by N,
otherwise signal an error.

(set-stream-marker STREAM)

If STREAM is an input stream, create a marker at the current position,
which can later be moved back to.  The stream does not need to be a
seekable stream.  In this case, all successive data will be buffered
to simulate the effect of a seekable stream.  Therefore use this
function with care.

(seek-stream-marker STREAM marker)

Move the stream back to the position that was stored in the marker
object. (this is generally an opaque object of type stream-marker).

(delete-stream-marker MARKER)

Destroy the stream marker and if the stream is a non-seekable stream
and there are no other stream markers pointing to an earlier position,
frees up some buffering information.

(delete-stream STREAM N)

(delete-stream-marker STREAM ID)

(close-stream stream)

Writes any remaining data to the stream and closes it and the object
to which it's attached.  This also happens automatically when the
stream is garbage collected.

(getchar-stream STREAM)

Return a single character from the stream. (This may be a single byte
depending on the nature of the stream).  This is actually a macro with
an extremely efficient implementation (as efficient as you can get in
Emacs Lisp), so that this can be used without fear in a loop.  The
implementation works by reading a large amount of data into a vector
and then simply using the function AREF to read characters one by one
from the vector.  Because AREF is one of the primitives handled
specially by the byte interpreter, this will be very efficient.  The
actual implementation may in fact use the function
call-with-condition-handler to avoid the necessity of checking for
overflow.  Its typical implementation is to fetch the vector
containing the characters as a stream property, as well as the index
into that vector.  Then it retrieves the character and increments the
value and stores it back in the stream.  As a first implementation, we
check to see when we are reading the character whether the character
would be out of range.  If so, we read another 4096 characters,
storing them into the same vector, setting the index back to the
beginning, and then proceeding with the rest of the getchar algorithm.

(putchar-stream STREAM CHAR)

This is similar to getchar-stream but it writes data instead of
reading data.

Function make-stream

There are actually two stream-creation functions, which are:

(make-input-stream TYPE PROPERTIES)
(make-output-stream TYPE PROPERTIES)

These can be used to create a stream that reads data, or writes data,
respectively.  PROPERTIES is a property list and the allowable
properties in it are defined by the type.  Possible types are:

(1) `file' (this reads data from a file or writes to a file)

    Allowable properties are:

    :file-name (the name of the file)

    :create (for output streams only, creates the file if it doesn't
    already exist)

    :exclusive (for output streams only, fails if the file already
    exists)

    :append (for output streams only; starts appending to the end
    of the file rather than overwriting the file)

    :offset (positions in bytes in the file where reading or writing
    should begin.  If unspecified, defaults to the beginning of the
    file or to the end of the file when :appended specified)

    :count (for input streams only, the number of bytes to read from
    the file before signaling "end of file".  If nil or omitted, the
    number of bytes is unlimited)

    :non-blocking (if true, reads or writes will fail if the operation
    would block.  This only makes sense for non-regular files).

(2) `process' (For output streams only, send data to a process.)

    Allowable properties are:

    :process (the process object)

(3) `buffer'  (Read from or write to a buffer.)

    Allowable properties are:

    :buffer (the name of the buffer or the buffer object.)

    :start (the position to start reading from or writing to.  If nil,
    use the buffer point.  If true, use the buffer's point and move
    point beyond the end of the data read or written.)

    :end (only for input streams, the position to stop reading at.  If
    nil, continue to the end of the buffer.)

    :ignore-accessible (if true, the default for :start and :end
    ignore any narrowing of the buffer.)

(4) `stream' (read from or write to a lisp stream)

    Allowable properties are:

    :stream (the stream object)

    :offset (the position to begin to be reading from or writing to)

    :length (For input streams only, the amount of data to read,
    defaulting to the rest of the data in the string.  Revise string
    for output streams only if true, the stream is resized as
    necessary to accommodate data written off the end, otherwise the
    writes will fail.

(5) `memory' (For output only, writes data to an internal memory
    buffer.  This is more lightweight than using a Lisp buffer.  The
    function memory-stream-string can be used to convert the memory
    into a string.)

(6) `debugging' (For output streams only, write data to the debugging
    output.)

(7) `stream-device' (During non-interactive invocations only, Read
    from or write to the initial stream terminal device.)

(8) `function' (For output streams only, send data by calling a
    function, exactly as with the STREAM argument to the print
    primitive.)

    Allowable Properties are:

    :function (the function to call.  The function is called with one
    argument, the stream.)

(9) `marker' (Write data to the location pointed to by a marker and
    move the marker past the data.)

    Allowable properties are:

    :marker (the marker object.)

(10) `decoding' (As an input stream, reads data from another stream and
     decodes it according to a coding system.  As an output stream
     decodes the data written to it according to a coding system and
     then writes results in another stream.)

     Properties are:

     :coding-system (the symbol of coding system object, which defines the
     decoding.)

     :stream (the stream on the other end.)

(11) `encoding' (As an input stream, reads data from another stream and
     encodes it according to a coding system.  As an output stream
     encodes the data written to it according to a coding system and
     then writes results in another stream.)

     Properties are:

     :coding-system (the symbol of coding system object, which defines the
     encoding.)

     :stream (the stream on the other end.)

Consider

(define-stream-type 'type
  :read-function
  :write-function
  :rewind-
  :seek-
  :tell-
  (?:buffer)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

- Generalized Coding Systems

  - Lisp API for Defining Coding Systems

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
User-defined coding systems.

(define-coding-system-type 'type
  :encode-function FUN
  :decode-function FUN
  :detect-function FUN
  :buffering (number = at least this many chars
              line   = buffer up to end of line
              regexp = buffer until this regexp is found in match
              source data.  match data will be appropriate when fun is
              called

encode fun is called as

(encode INSTREAM OUTSTREAM)

should read data from instream and write converted result onto
outstream.  Can leave some data stuff in stream, it will reappear
next time.  Generally, there is a finite amount of data in instream
and further attempts to read lead to would-block errors or retvals.
Can use instream properties to record state.  May use read-stream
functionality to read everything into a vector or string.

->Need vectors + string exposed to resizing of Lisp implementation
  where necessary.

  

Discussion

Ben sez:

From: Ben Wing <ben@666.com>
Sender: owner-xemacs-beta@xemacs.org
To: "Alastair J. Houghton" <ajhoughton@lineone.net>
CC: xemacs-beta@xemacs.org
Message-ID: <3973766F.197819DE@666.com>
Subject: Re: Lstreams and Lisp
Date: Mon, 17 Jul 2000 14:11:11 -0700
X-Mailer: Mozilla 4.73 [en] (Windows NT 5.0; U)

alastair, this looks great!  please continue.

as for lstreams, originally i wanted them not to escape because there
were no lisp accessors, and there may be [or might have been,
conceivably] primitives that could take lstreams as arguments and
might [conceivably ...?] not work if some strange lstream were passed
in.

but i've actually been thinking of creating an lstream interface
myself, for use in creating arbitrary lisp coding systems [i want to
extend the coding-system interface to work not just with international
encodings but to be able to handle gzip, base64, md5, etc.].

[unfortunately, what you're working on now doesn't fit into this
system because the latter only deals with strings/streams of text or
binary data and not arbitrary lisp objects; although i can certainly
see the usefulness of an arbitrary lisp object converter, and it looks
like that's exactly what you're working on here.  the coding system
stuff would still be useful because it includes various optimizations
for working specifically with streams; but eventually i would really
like to see the interfaces merged.  e.g. why couldn't you `find-file'
using a coding system that generated sound and image objects mixed in
with the text?  that's exactly what modern html browsers do, in
essence.  i suppose i should extend the coding system interface to
allow text marked up with extents; still ...]

i'm appending a rather raw writeup of my proposed lstream interface,
with some bits on extending the coding system mechanism. [this comes
out of a massive document of such proposals that martin and i sent to
japan a few months ago as part of the contract that he and i are
getting from them.  most of this is stuff he transcribed from
scribbled notes i faxed to him, since i can't type too well any more
but can still write more or less; and the rest of it i dictated to a
professional transcriptionist [with no technical knowledge, of
course!], and was cleaned up by martin.  that's why it's so messy.]

if you're interested in implementing this lstream interface or
something like it, please go ahead!  i've got my hands busy with mule
work and merging of existing workspaces into the code base for quite
some time now.

btw when you have time you might want to extend your 'string' encoding
to allow encoding/decoding using a coding system, which would almost
certainly be required when the string contains non-ascii
characters. [e.g. when sending text to an x selection, `ctext' is
required, and for windows, `mswindows-tstr'.]

also, you might consider adding funs that allow creating a
user-defined "encoding", instead of specifying the conversion
functions directly.

  

This is in response to a post by Alastair J. Houghton <ajhoughton@lineone.net> some part of which appears below:

Why does it say

/* #define CHECK_LSTREAM(x) CHECK_RECORD (x, lstream)
   Lstream pointers should never escape to the Lisp level, so
   functions should not be doing this. */

in lstream.h? The reason I'm wondering is that I'd like to
make my encode-binary and decode-binary functions work
with arbitrary output sinks/input sources, so the obvious
implementation is to create a suitable Lstream within the
Lisp-visible functions. The trouble is that I want the
interface to the functions to include the facility to add
user-defined conversions, which means that a Lisp function
may have to accept an Lstream parameter... so I'm wondering
whether there was any reason for this comment ;-)

Just in case you're interested, here's the interface I'm
proposing (there'll be an additional encode-binary-string
function that works in an efficient way). The STREAM parameter
could accept any Lisp object for which an Lstream can be
created.

DEFUN ("encode-binary-stream", Fencode_binary_stream, 3, 3, 0, /*
Encode the sequence DATA into a binary STREAM using the specified
binary FORMAT vector.  Each element of the FORMAT vector should either
be a symbol, or a list of the form (SYMBOL PARAMETER...). SYMBOL may be
one of

  binary  string  bit-vector  integer  float  space  vector

or alternatively the name of a Lisp function that will be called with the
remaining data, the output stream and a list of PARAMETER values as its
arguments.  i.e. it's declaration should look something like the following

 (defun my-conversion data stream parameter-list ... )

and it will be called using

 (my-conversion data stream parameter-list)

Such a function should return the remaining data after it has consumed
whatever it required.

The built-in encodings support the following parameters:

  Encoding    Parameters

  binary      :length
  string      :length :pad :terminator
  bit-vector  :length :direction
  integer     :length :signed :direction
  float       :length :format :direction
  space       :length
  vector      FORMAT-ELT :length :pack

where

  FORMAT-ELT  is anything that could be an element of the FORMAT parameter.

  :length     is followed by a length in bytes (or in bits for bit-vector).

  :pad        is followed by a character used to pad the string to the
              specified length.

  :terminator is followed by a character used to terminate the string.

  :direction  is followed by one of `big-endian', `little-endian', `host'
              or `network'.

  :signed     is followed by t or nil.

  :format     is followed by `native'. Conversions to other floating point
              formats are currently not supported.

  :pack       is followed by an integer specifying the vector stride
              (e.g. the format [(vector (integer :length 2) :pack 4)]
              represents an array of 16-bit integers, but with a gap
              of 2 bytes between successive elements).

The function returns a string containing the raw binary data. */
       (format, data, stream))

DEFUN ("decode-binary-stream", Fdecode_binary_stream, 2, 2, 0, /*
Decode the specified STREAM using the binary FORMAT vector. See
`encode-binary-stream' for more information on the FORMAT vector;
note however that user-defined conversion functions should be declared
as

  (defun my-conversion stream parameter-list ...)

and should return the data they have converted. */
         (format, stream))

  

Closed bugs

None.

 
 

Conform with <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Automatically validated by PSGML