[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
31.1 Introduction to Extents | Extents are ranges over text, with properties. | |
31.2 Extent Ordering | How extents are ordered internally. | |
31.3 Format of the Extent Info | The extent information in a buffer or string. | |
31.4 Zero-Length Extents | A weird special case. | |
31.5 Mathematics of Extent Ordering | A rigorous foundation. | |
31.6 Extent Fragments | Cached information useful for redisplay. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Extents are regions over a buffer, with a start and an end position denoting the region of the buffer included in the extent. In addition, either end can be closed or open, meaning that the endpoint is or is not logically included in the extent. Insertion of a character at a closed endpoint causes the character to go inside the extent; insertion at an open endpoint causes the character to go outside.
Extent endpoints are stored using memory indices (see ‘insdel.c’), to minimize the amount of adjusting that needs to be done when characters are inserted or deleted.
(Formerly, extent endpoints at the gap could be either before or after the gap, depending on the open/closedness of the endpoint. The intent of this was to make it so that insertions would automatically go inside or out of extents as necessary with no further work needing to be done. It didn’t work out that way, however, and just ended up complexifying and buggifying all the rest of the code.)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Extents are compared using memory indices. There are two orderings for extents and both orders are kept current at all times. The normal or display order is as follows:
Extent A is ``less than'' extent B, that is, earlier in the display order, if: A-start < B-start, or if: A-start = B-start, and A-end > B-end |
So if two extents begin at the same position, the larger of them is the
earlier one in the display order (EXTENT_LESS
is true).
For the e-order, the same thing holds:
Extent A is ``less than'' extent B in e-order, that is, later in the buffer, if: A-end < B-end, or if: A-end = B-end, and A-start > B-start |
So if two extents end at the same position, the smaller of them is the
earlier one in the e-order (EXTENT_E_LESS
is true).
The display order and the e-order are complementary orders: any theorem about the display order also applies to the e-order if you swap all occurrences of “display order” and “e-order”, “less than” and “greater than”, and “extent start” and “extent end”.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
An extent-info structure consists of a list of the buffer or string’s extents and a stack of extents that lists all of the extents over a particular position. The stack-of-extents info is used for optimization purposes—it basically caches some info that might be expensive to compute. Certain otherwise hard computations are easy given the stack of extents over a particular position, and if the stack of extents over a nearby position is known (because it was calculated at some prior point in time), it’s easy to move the stack of extents to the proper position.
Given that the stack of extents is an optimization, and given that
it requires memory, a string’s stack of extents is wiped out each
time a garbage collection occurs. Therefore, any time you retrieve
the stack of extents, it might not be there. If you need it to
be there, use the _force
version.
Similarly, a string may or may not have an extent_info structure.
(Generally it won’t if there haven’t been any extents added to the
string.) So use the _force
version if you need the extent_info
structure to be there.
A list of extents is maintained as a double gap array. One gap array is ordered by start index (the display order) and the other is ordered by end index (the e-order). Note that positions in an extent list should logically be conceived of as referring to a particular extent (as is the norm in programs) rather than sitting between two extents. Note also that callers of these functions should not be aware of the fact that the extent list is implemented as an array, except for the fact that positions are integers (this should be generalized to handle integers and linked list equally well).
A gap array is the same structure used by buffer text: an array of elements with a "gap" somewhere in the middle. Insertion and deletion happens by moving the gap to the insertion/deletion point, and then expanding/contracting as necessary. Gap arrays have a number of useful properties:
An alternative would be balanced binary trees, which have guaranteed O(log N) time for all operations (although the constant factors are not as good, and repeated localized operations will be slower than for a gap array). Such code is quite tricky to write, however.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Extents can be zero-length, and will end up that way if their endpoints
are explicitly set that way or if their detachable property is nil
and all the text in the extent is deleted. (The exception is open-open
zero-length extents, which are barred from existing because there is
no sensible way to define their properties. Deletion of the text in
an open-open extent causes it to be converted into a closed-open
extent.) Zero-length extents are primarily used to represent
annotations, and behave as follows:
Note that closed-open, non-detachable zero-length extents behave exactly like markers and that open-closed, non-detachable zero-length extents behave like the “point-type” marker in Mule.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The extents in a buffer are ordered by “display order” because that is that order that the redisplay mechanism needs to process them in. The e-order is an auxiliary ordering used to facilitate operations over extents. The operations that can be performed on the ordered list of extents in a buffer are
(4) requires being able to determine the first and last extents that overlap a range.
NOTE: overlap is used as follows:
map-extents
maps over, since map-extents
sometimes pays attention to whether the endpoints of an extents are open
or closed. But for our purposes, it greatly simplifies things to treat
all extents as having closed endpoints.
First, define >, <, <=, etc. as applied to extents to mean comparison according to the display order. Comparison between an extent E and an index I means comparison between E and the range [I, I].
Also define e>, e<, e<=, etc. to mean comparison according to the e-order.
For any range R, define R(0) to be the starting index of the range and R(1) to be the ending index of the range.
For any extent E, define E(next) to be the extent directly following E, and E(prev) to be the extent directly preceding E. Assume E(next) and E(prev) can be determined from E in constant time. (This is because we store the extent list as a doubly linked list.)
Similarly, define E(e-next) and E(e-prev) to be the extents directly following and preceding E in the e-order.
Now:
Let R be a range. Let F be the first extent overlapping R. Let L be the last extent overlapping R.
Theorem 1: R(1) lies between L and L(next), i.e. L <= R(1) < L(next).
This follows easily from the definition of display order. The basic reason that this theorem applies is that the display order sorts by increasing starting index.
Therefore, we can determine L just by looking at where we would insert R(1) into the list, and if we know F and are moving forward over extents, we can easily determine when we’ve hit L by comparing the extent we’re at to R(1).
Theorem 2: F(e-prev) e< [1, R(0)] e<= F. |
This is the analog of Theorem 1, and applies because the e-order sorts by increasing ending index.
Therefore, F can be found in the same amount of time as operation (1), i.e. the time that it takes to locate where an extent would go if inserted into the e-order list. This is O(log N), since we are using gap arrays to manage extents.
Define a stack of extents (or SOE) as the set of extents (ordered in display order and e-order, just like for normal extent lists) that overlap an index I.
Now:
Let I be an index, let S be the stack of extents on I and let F be the first extent in S.
Theorem 3: The first extent in S is the first extent that overlaps any range [I, J].
Proof: Any extent that overlaps [I, J] but does not include I must have a start index > I, and thus be greater than any extent in S.
Therefore, finding the first extent that overlaps a range R is the same as finding the first extent that overlaps R(0).
Theorem 4: Let I2 be an index such that I2 > I, and let F2 be the first extent that overlaps I2. Then, either F2 is in S or F2 is greater than any extent in S.
Proof: If F2 does not include I then its start index is greater than I and thus it is greater than any extent in S, including F. Otherwise, F2 includes I and thus is in S, and thus F2 >= F.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Imagine that the buffer is divided up into contiguous, non-overlapping runs of text such that no extent starts or ends within a run (extents that abut the run don’t count).
An extent fragment is a structure that holds data about the run that contains a particular buffer position (if the buffer position is at the junction of two runs, the run after the position is used)—the beginning and end of the run, a list of all of the extents in that run, the merged face that results from merging all of the faces corresponding to those extents, the begin and end glyphs at the beginning of the run, etc. This is the information that redisplay needs in order to display this run.
Extent fragments have to be very quick to update to a new buffer position when moving linearly through the buffer. They rely on the stack-of-extents code, which does the heavy-duty algorithmic work of determining which extents overly a particular position.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Aidan Kehoe on December 27, 2016 using texi2html 1.82.