31. Extents

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

31.1 Introduction to Extents

Extents are regions over a buffer, with a start and an end position denoting the region of the buffer included in the extent. In addition, either end can be closed or open, meaning that the endpoint is or is not logically included in the extent. Insertion of a character at a closed endpoint causes the character to go inside the extent; insertion at an open endpoint causes the character to go outside.

Extent endpoints are stored using memory indices (see ‘insdel.c’), to minimize the amount of adjusting that needs to be done when characters are inserted or deleted.

(Formerly, extent endpoints at the gap could be either before or after the gap, depending on the open/closedness of the endpoint. The intent of this was to make it so that insertions would automatically go inside or out of extents as necessary with no further work needing to be done. It didn’t work out that way, however, and just ended up complexifying and buggifying all the rest of the code.)

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

31.2 Extent Ordering

Extents are compared using memory indices. There are two orderings for extents and both orders are kept current at all times. The normal or display order is as follows:

Extent A is ``less than'' extent B,
that is, earlier in the display order,
  if:    A-start < B-start,
  or if: A-start = B-start, and A-end > B-end

So if two extents begin at the same position, the larger of them is the earlier one in the display order (EXTENT_LESS is true).

For the e-order, the same thing holds:

Extent A is ``less than'' extent B in e-order,
that is, later in the buffer,
  if:    A-end < B-end,
  or if: A-end = B-end, and A-start > B-start

So if two extents end at the same position, the smaller of them is the earlier one in the e-order (EXTENT_E_LESS is true).

The display order and the e-order are complementary orders: any theorem about the display order also applies to the e-order if you swap all occurrences of “display order” and “e-order”, “less than” and “greater than”, and “extent start” and “extent end”.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

31.3 Format of the Extent Info

An extent-info structure consists of a list of the buffer or string’s extents and a stack of extents that lists all of the extents over a particular position. The stack-of-extents info is used for optimization purposes—it basically caches some info that might be expensive to compute. Certain otherwise hard computations are easy given the stack of extents over a particular position, and if the stack of extents over a nearby position is known (because it was calculated at some prior point in time), it’s easy to move the stack of extents to the proper position.

Given that the stack of extents is an optimization, and given that it requires memory, a string’s stack of extents is wiped out each time a garbage collection occurs. Therefore, any time you retrieve the stack of extents, it might not be there. If you need it to be there, use the _force version.

Similarly, a string may or may not have an extent_info structure. (Generally it won’t if there haven’t been any extents added to the string.) So use the _force version if you need the extent_info structure to be there.

A list of extents is maintained as a double gap array. One gap array is ordered by start index (the display order) and the other is ordered by end index (the e-order). Note that positions in an extent list should logically be conceived of as referring to a particular extent (as is the norm in programs) rather than sitting between two extents. Note also that callers of these functions should not be aware of the fact that the extent list is implemented as an array, except for the fact that positions are integers (this should be generalized to handle integers and linked list equally well).

A gap array is the same structure used by buffer text: an array of elements with a "gap" somewhere in the middle. Insertion and deletion happens by moving the gap to the insertion/deletion point, and then expanding/contracting as necessary. Gap arrays have a number of useful properties:

They are space efficient, as there is no need for next/previous pointers.
If the items in them are sorted, locating an item is fast – O(log N).
Insertion and deletion is very fast (constant time, essentially) if the gap is near (which favors localized operations, as will usually be the case). Even if not, it requires only a block move of memory, which is generally a highly optimized operation on modern processors.
Code to manipulate them is relatively simple to write.

An alternative would be balanced binary trees, which have guaranteed O(log N) time for all operations (although the constant factors are not as good, and repeated localized operations will be slower than for a gap array). Such code is quite tricky to write, however.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

31.4 Zero-Length Extents

Extents can be zero-length, and will end up that way if their endpoints are explicitly set that way or if their detachable property is nil and all the text in the extent is deleted. (The exception is open-open zero-length extents, which are barred from existing because there is no sensible way to define their properties. Deletion of the text in an open-open extent causes it to be converted into a closed-open extent.) Zero-length extents are primarily used to represent annotations, and behave as follows:

Insertion at the position of a zero-length extent expands the extent if both endpoints are closed; goes after the extent if it is closed-open; and goes before the extent if it is open-closed.
Deletion of a character on a side of a zero-length extent whose corresponding endpoint is closed causes the extent to be detached if it is detachable; if the extent is not detachable or the corresponding endpoint is open, the extent remains in the buffer, moving as necessary.

Note that closed-open, non-detachable zero-length extents behave exactly like markers and that open-closed, non-detachable zero-length extents behave like the “point-type” marker in Mule.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

31.5 Mathematics of Extent Ordering

The extents in a buffer are ordered by “display order” because that is that order that the redisplay mechanism needs to process them in. The e-order is an auxiliary ordering used to facilitate operations over extents. The operations that can be performed on the ordered list of extents in a buffer are

Locate where an extent would go if inserted into the list.
Insert an extent into the list.
Remove an extent from the list.
Map over all the extents that overlap a range.

(4) requires being able to determine the first and last extents that overlap a range.

NOTE: overlap is used as follows:

two ranges overlap if they have at least one point in common. Whether the endpoints are open or closed makes a difference here.
a point overlaps a range if the point is contained within the range; this is equivalent to treating a point P as the range [P, P].
In the case of an extent overlapping a point or range, the extent is normally treated as having closed endpoints. This applies consistently in the discussion of stacks of extents and such below. Note that this definition of overlap is not necessarily consistent with the extents that map-extents maps over, since map-extents sometimes pays attention to whether the endpoints of an extents are open or closed. But for our purposes, it greatly simplifies things to treat all extents as having closed endpoints.

First, define >, <, <=, etc. as applied to extents to mean comparison according to the display order. Comparison between an extent E and an index I means comparison between E and the range [I, I].

Also define e>, e<, e<=, etc. to mean comparison according to the e-order.

For any range R, define R(0) to be the starting index of the range and R(1) to be the ending index of the range.

For any extent E, define E(next) to be the extent directly following E, and E(prev) to be the extent directly preceding E. Assume E(next) and E(prev) can be determined from E in constant time. (This is because we store the extent list as a doubly linked list.)

Similarly, define E(e-next) and E(e-prev) to be the extents directly following and preceding E in the e-order.

Now:

Let R be a range. Let F be the first extent overlapping R. Let L be the last extent overlapping R.

Theorem 1: R(1) lies between L and L(next), i.e. L <= R(1) < L(next).

This follows easily from the definition of display order. The basic reason that this theorem applies is that the display order sorts by increasing starting index.

Therefore, we can determine L just by looking at where we would insert R(1) into the list, and if we know F and are moving forward over extents, we can easily determine when we’ve hit L by comparing the extent we’re at to R(1).

Theorem 2: F(e-prev) e< [1, R(0)] e<= F.

This is the analog of Theorem 1, and applies because the e-order sorts by increasing ending index.

Therefore, F can be found in the same amount of time as operation (1), i.e. the time that it takes to locate where an extent would go if inserted into the e-order list. This is O(log N), since we are using gap arrays to manage extents.

Define a stack of extents (or SOE) as the set of extents (ordered in display order and e-order, just like for normal extent lists) that overlap an index I.

Now:

Let I be an index, let S be the stack of extents on I and let F be the first extent in S.

Theorem 3: The first extent in S is the first extent that overlaps any range [I, J].

Proof: Any extent that overlaps [I, J] but does not include I must have a start index > I, and thus be greater than any extent in S.

Therefore, finding the first extent that overlaps a range R is the same as finding the first extent that overlaps R(0).

Theorem 4: Let I2 be an index such that I2 > I, and let F2 be the first extent that overlaps I2. Then, either F2 is in S or F2 is greater than any extent in S.

Proof: If F2 does not include I then its start index is greater than I and thus it is greater than any extent in S, including F. Otherwise, F2 includes I and thus is in S, and thus F2 >= F.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

31.6 Extent Fragments

Imagine that the buffer is divided up into contiguous, non-overlapping runs of text such that no extent starts or ends within a run (extents that abut the run don’t count).

An extent fragment is a structure that holds data about the run that contains a particular buffer position (if the buffer position is at the junction of two runs, the run after the position is used)—the beginning and end of the run, a list of all of the extents in that run, the merged face that results from merging all of the faces corresponding to those extents, the begin and end glyphs at the beginning of the run, etc. This is the information that redisplay needs in order to display this run.

Extent fragments have to be very quick to update to a new buffer position when moving linearly through the buffer. They rely on the stack-of-extents code, which does the heavy-duty algorithmic work of determining which extents overly a particular position.

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by Aidan Kehoe on December 27, 2016 using texi2html 1.82.

31.1 Introduction to Extents		Extents are ranges over text, with properties.
31.2 Extent Ordering		How extents are ordered internally.
31.3 Format of the Extent Info		The extent information in a buffer or string.
31.4 Zero-Length Extents		A weird special case.
31.5 Mathematics of Extent Ordering		A rigorous foundation.
31.6 Extent Fragments		Cached information useful for redisplay.