{ 143}

CHAPTER 7
CROSS-REFERENCES, SORTING, AND FORMATING

Once index entries have been generated, three further processes go into producing an index display: 1. addition of cross-references and other index elements; 2. sorting the index elements; and 3. formating the display.

7.1 CROSS-REFERENCES AND OTHER INDEX ELEMENTS

After index entries, cross-references are by far the most common elements in string indexes. A cross-reference is made up of three basic parts: the first is similar to an index string; the second is a special sort of connective; and the third indicates an index element or family of index elements elsewhere in the index. A simple example is:
| 1COMPUTERS | 2: see also | 3MICROCOMPUTERS |
The connective gives the searcher an instruction. For example, "see also" says: "If you are looking for index entries beginning with the first part, perhaps you will find useful information in the part of the index specified by the last part". The connective "see" is used in a similar way, except that it also points out that no index entries begin with the first part.

An index entry represents a complete path leading searchers from an access term through one or more intermediate stages to a location outside the index; { 144} by contrast, a cross-reference corresponds to only the beginning of such a path. For example, the back-of-book index entry

SEA as DEMONIC IMAGE : 146
represents a complete path from the access term "SEA" to page 146. By contrast, the cross-reference
SEA : see also IMAGERY
corresponds to the beginning of a path; rather than leading searchers all the way out of the index, it directs them to another part of the index, which contains index entries starting with "IMAGERY".

Searchers can continue in the direction indicated by a cross-reference only after performing a kind of detour in order to find one or more of the index elements to which the cross-reference refers them. Such a detour obviously requires effort of the searcher. If searchers follow cross-references, the effort required implies a loss of efficiency. If searchers instead choose to avoid the effort by not following cross-references, recall may suffer.

Cross-references do have advantages. A cross-referenced index is often less bulky than one providing index strings to cover the various possible directions from which searchers might approach an indexed item. Because of decreased bulk, searchers may find the cross-referenced index easier to search.

The use of "see" cross-references generally implies that some terms are chosen as preferred terms for use in descriptions; "see" references are made to the preferred terms from equivalent terms which might be useful access terms, but which are not preferred. Thus, direct access is provided under one term and indirect access under others. Consistent preference for certain terms in index strings may increase predictability to such an extent that experienced searchers do not even need the "see" references. For example, after some experience, a searcher might start to remember that the preferred term for merchandise is "GOODS" and not even bother to look up "MERCHANDISE".

In string indexing, "see also" cross-references are commonly employed to avoid the need for lead-only terms; "see" references, the need for alternate access terms. For example, instead of the two index strings

COMPUTERS. MICROCOMPUTERS. USE in EDUCATION
and
MICROCOMPUTERS. USE in EDUCATION
a string index may contain only the second index string and the cross-reference
COMPUTERS : see also MICROCOMPUTERS
{ 145}
Likewise, a string index using the PRECIS authority file does not have the two index strings
Goods. Eastern Europe
    Physical distribution
and
Merchandise. Eastern Europe
    Physical distribution
Rather, it has only the first index string and the cross-reference
Merchandise See Goods

In an online system, aids can be introduced to help searchers deal with cross-references more efficiently. To take just one example, the index display for the searcher's starting term can be in one "window" on the screen while the display for a cross-referenced term is in another window; the searcher then does not have to remember, or write out by hand, the list of cross-references.

7.1.1 Simple Cross-references

Both the first and the third parts of a simple cross-reference are single terms. Simple cross-references take up little space in an index display and are generally easy for searchers to understand. For example, the CIFT software uses its thesaurus to generate automatically the simple cross-references required for any access term in an input string. The access term "Folk rituals" would call for cross-references such as:
ETHNOMUSICOLOGY
See also related term: Folk rituals.

FOLK CUSTOMS
Use: Folk rituals.

FOLK LITERATURE
See also related term: Folk rituals.

HISTORY AND STUDY OF FOLKLORE
See also related term: Folk rituals.

MATERIAL CULTURE
See also related term: Folk rituals.

RITUALS
See also narrower term: Folk rituals.
In CIFT, "use" is employed instead of "see", and "see also" is divided into two types, one for pointing to a narrower term and the other for pointing to terms related in other ways. { 146}

7.1.2 Cross-references involving multiterm expressions

It is possible to make cross-references not just between single terms, but from or to an expression which contains more than one term. Simple examples from a PRECIS index are:
Ophthalmology
    See also
        Eyes. Man

pH See Concentration. Hydrogen ions

Physiology. Man
    See also
        Sex. Man

Proteins
    See also
        Growth hormones. Animals

Some argument can be made in favour of cross-references involving multiterm expressions: 1. on the grounds of eliminability and collocation, when many cross-references start with the same term; 2. on the grounds of clarity, when the meaning of the third part of the cross-reference may be unfamiliar. On the other hand, these cross-references take longer to read and lead to a bulkier index than do simple cross-references.

Cross-references involving multiterm expressions are fairly rare in PRECIS indexes, but the PRECIS software does make provision for generating them where they are needed. The thesaurus from which the cross-references are generated relates not only individual terms, but also expressions containing more than one term. Each term or expression in the thesaurus has an associated RIN address; and the software generates cross-references, not from access terms occurring in input strings, but from appropriate RINs which are input separately. To produce such effects as the shift out of boldface in an expression such as "Concentration. Hydrogen ions", the expressions stored at RIN addresses are allowed to contain typographic and layout codes.

A type of cross-reference between multiterm expressions unusual in string indexing occurs when the two expressions contain the same terms but in different orders; for example, in Yeats' proposed Statement Indexing,

Alfalfa - hay see Hay/f. Alfalfa

In other types of indexing systems, cross-references involving multiterm expressions may become a substitute for the multiple entries per indexed item characteristic of string indexing. Thus, Current Technology Index (Library Association 1981), which uses a modified form of chain procedure, has many cross-references like

WIRE MESH
  See
    Mesh:Wires
{ 147} For the same reason, CTI has many cross-references from one term or string to another string containing the first:
WIND TUNNELS
  See also
    Aircraft:Gas turbines:Jets:Noise:Simulation: Wind tunnels
    Aircraft:Testing:Wind tunnels
    Motor vehicles:Scale models:Drag:Testing: Wind tunnels
    Spheres:Drag:Testing:Wind tunnels

WATER:Supplies
  See also
    Refugees' camps:Water:Supplies

7.1.3 Permuted Cross-references

More than one cross-reference may conceivably have the same second and third part, and more or less the same terms in various orders in the first; for example,
  1. Density. Increase by Reduction of Pore Space * SEE Compaction
  2. Pore Space. Reduction. Increase in Density * SEE Compaction
String indexing systems with software which can generate such a set of cross references from a single specification include NEPHIS, LIPHIS, NETPAD, Relational Indexing, and simple systems like KWOC. The software in these systems treats the first part of the cross-reference specification as if it were an index string and the rest as if it were a locator. As an example, the NEPHIS cross-reference specification
@Increase? in <Density>? by <@Reduction? of <Pore Space>> * SEE Compaction
yields the permuted cross-references above.

NEPHIS software can generate cross-reference specifications automatically by comparing index strings with entries in its thesaurus. For example, for the cross-reference specification just given, the required thesaurus entry is

Compaction=@Increase? in <Density>? by <Reduction? of <Pore Space>>
{ 148} In order for the cross-reference specification to be generated, the first part of the thesaurus entry, such as "Compaction" above, must be the first part of at least one index string. The second part of the thesaurus entry, depending on whether it consists of one or two equal signs, indicates whether to generate a "SEE" or a "SEE ALSO" cross-reference specification; the third part becomes the beginning of the cross-reference specification.

7.1.4 Other index elements

A type of information often included in indexes concerns the general sense in which an access term has been used. One way of conveying this information is by listing some synonyms. The CIFT software, for example, automatically generates a "used for" note; the requirement is that an access term in an input string have one or more non-preferred synonyms listed in the CIFT thesaurus. Thus, supppose access is to be provided to an indexed item under "HESSE, HERMAN (1877-1962)". The fact that "Sinclair, Emil" is listed in the thesaurus as a non-preferred term for this author's name will result in an index element
HESSE, HERMAN (1877-1962)
Used for: Sinclair, Emil

7.2 SORTING

The order in which index entries and other index elements appear within the index as a whole is known as filing order. Filing order is determined jointly by the elements to be sorted and the rules employed in sorting them.  In string indexing, sorting should produce a filing order which activates the good qualities of the index elements, especially collocation and predictability.

Predictability is most important in the early part of a search, when the searcher is looking for index elements beginning in a certain way. Searchers should be able to look at any index element and decide quickly whether the part of the index that they are seeking is before or after that element. In this way, they can very rapidly narrow down the elements through which to search in a way quite similar to the binary search routine used in some computerized systems. Thus, sorting as it affects long sequences of elements should reflect the predictability of index strings.

When searchers are scanning short sequences of index elements rather than seeking for sequences to scan, collocation tends to be important. Thus, sorting as it affects short sequences should produce good collocation of index strings. { 149}

Other qualities of filing order may also be important for the later parts of the search. For example, searchers probably tend to scan or browse from the beginning of a sequence; if so, index elements might best be placed first which are most likely to be relevant or which contain information helpful for understanding later index elements. Thus, a note explaining the meaning of a lead term should generally file before index entries beginning with the term.

7.2.1 Simple character-by-character sorting

The entries and other elements of an index are generally all strings of characters; a simple computerized sorting procedure sorts a set of character strings character by character according to a standard ranking of the different characters. The character strings are arranged first according to the rank of their first characters; where two or more character strings have the same first character, they are subarranged by the ranks of their second characters; and so on. Thus, every character, whether blank, punctuation mark, accent, or letter or numeral, is significant at exactly the place where it occurs in a string. The standard ranking of different characters is that of the American Standard Code for Information Interchange (ASCII). In ascending order, this ranking in part runs:
space
!
"
#
$
%
&
'
(
)
*
+
,
-
.
/
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
@
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
[
\
]
^
_
`
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
{
|
}
~
Take, for example, the sorted list:
Edwards, T. 1975
Farradane and Yates-Mercer. 1973
Farradane, J. 1977
Farradane, J.E.L. 1950
Fischer, M. 1966
Here, "Edwards, T. 1975" is first because "E" ranks before "F"; "Fischer, M. 1966" follows the strings beginning with "Farradane" because "i" ranks after "a"; "Farradane and Yates-Mercer. 1973" precedes the other two "Farradane" strings because the space ranks before the comma; and "Farradane, J. 1977" comes before "Farradane, J.E.L. 1950" because the space ranks before "E".

Simple character-by-character sorting has weaknesses in respect to more than one of the desirable qualities of index entries, and this includes predictability.  True, where any string will appear in such an arrangement is completely predictable given complete knowledge of two things: 1. the ranking of all characters; 2. the exact form of any index string. On the other hand, the ordinary searcher of a string index display is unlikely to possess complete knowledge of either. Searchers may remember the ranking of the letters of the alphabet moderately well; they are much less likely to know that all the lower-case letters rank after all the capitals or that square { 150} brackets ("[","]") rank after capitals while parentheses ("(", ")") rank before. Likewise, they will have difficulty in foreseeing which letters might be capitalized and which punctuation marks and accents might appear. Even a searcher who knew the complete character ranking and the exact form of any string might find predicting some other, more familiar, arrangement easier.

Thus, string indexing systems may require sorting procedures more sophisticated than simple character-by-character sorting. The degree of sophistication required depends on many factors, including the forms of the index strings, their similarity, and the number of indexed items. Some string indexing systems, such as CIFT, have fairly sophisticated sorting options built into their software; some, such as PRECIS (Cain 1984), make use of filing procedures applied to other bibliographic products; others, such as NETPAD and LIPHIS, use very simple sorting; PERMDEX and some versions of KWIC and KWOC do not even use the entire index strings, but only their initial parts, to determine the sorted order.

7.2.2 Other features of general sorting software

Even when simple character-by-character sorting is inadequate, designers of string indexing systems may nevertheless find that general-purpose sorting software possesses additional features sufficient to meet their needs. Computer users, quite apart from searchers of string indexes, are known to regard the distinction between certain characters as relatively insignificant. Thus, computerized sorting procedures often automatically rank lowercase { 151} letters the same as their uppercase equivalents; and more powerful sorting software allows users to specify their own rankings of characters, with more than one character sharing the same rank if desired. A possible user-defined ranking of characters for string indexing is:
)
.
:
;
&
(
-  space
0
1
2
3
4
5
6
7
8
9
A  a
B  b
C  c
D  d
E  e
F  f
G  g
H  h
I  i
J  j
K  k
L  l
M  m
N  n
O  o
P  p
Q  q
R  r
S  s
T  t
U  u
V  v
W  w
X  x
Y  y
Z  z

7.2.3 Ignoring elements in sorting

Accents and quotation marks are examples of characters which searchers may consider relatively insignificant, and whose presence or absence and whose ranking relative to other characters they may have difficulty anticipating. Take, for example, searchers interested in information on "how to write resumes that get jobs". Most of them will likely regard the distinction between the spellings "resumes" and "résumés" as insignificant and whether the accents are included or not as hard to foresee. Very few are likely to know the rank of the acute accent in ASCII or to have any idea whether they should regard it as preceding or following the "e" to which it is attached.

In what is traditionally called "letter-by-letter" sorting, spaces and hyphens are ignored. This has value when it ranks "key-word" and "key word" equally with "keyword"; on the other hand, the consequent intervention of "coalitions" between "coal faces" and "coal mines" is less desirable.

Some sorting software can be instructed to ignore characters on a list supplied by the user. If such software is not available, system designers can take other approaches.  For example, each index element can begin with a string of characters which determines its place in the sequence; it can then continue with a string of characters which instructs what additions and changes to make to the first string before displaying it as part of the index display.

The index designer may want whole words to be ignored. If so, indexers may be required to code the words to be ignored explicitly in the index strings or the string indexing software may recognize the words automatically. { 152}

Examples of the coding of words to be ignored in sorting are found in the treatment of some proper names in PRECIS. For instance, the PRECIS input string segment

$z11030$cBeecham$eSir$gThomas$eBart
tells the PRECIS software: 1. to display index strings containing the name "Beecham, Sir Thomas, Bart"; 2. to sort the entries as though they contained only the name "Beecham Thomas". The "$e" code indicates both that what follows is to be displayed in italics with a comma before it and that it is not to be regarded in sorting.

Automatically disregarded words are usually connectives. Examples occur in the software for both ASI and Relational Indexing. The results are not always pleasing to index designers. For example, it is true that searchers aware that "of" is disregarded in sorting can see the following arrangement of subheadings as alphabetical:

Representation
    of footpaths
    generalization and, in automated cartography
    of ice features
    methods of, in travel speed maps
    of nomadic peoples
    problems of, in aeronautical charts
    of railways
Yet, to unaware searchers, the same arrangement may appear merely chaotic (Hall 1973).

7.2.4 Chronological order

Where dates are involved, searchers may find chronological order useful and easily recognized, especially within small portions of an overall sequence.  The PRECIS software accordingly treats ranges of dates, which are marked by the "$d" code in input strings, in a special way when sorting index elements. A PRECIS date term usually has a starting date and a finishing date; for example, "1800-1860", "B.C. 500-A.D. 1973". Sometimes the starting date is unspecified; for example, "to 1974".  The starting date, the finishing date, or both may be qualified as approximate; for example, "ca 1800-1914", "to ca 1900", "ca 1-ca 1600".  For sorting, date ranges are ranked first chronologically by the starting date. An unspecified date is ranked before all specified dates, and a definite date is ranked before an approximate date that is otherwise the same. Date ranges with exactly the same starting date are subranked by the finishing date. An example of the results is the sequence of subheadings under "Europe" (British Library Automated Information Service 1979): { 153}
Europe
    to ca 1200
    to 1935
    to 1976 - Festschriften
    B.C. 500-A.D. 1973
    ca 1-ca 1600 - Serials
    300-1500
    300-1500. Historiology - Essays
    300-1900 - Readings
    300-1973
    ca 300-ca 1450
    400-1000 - Essays
    ca 400-1066
    476-911
    500-1300 - For Irish students - Secondary school texts
    742-814
    768-814
    800-1150
    1000-1500. Historiology - Essays
    [other subheadings with exact and approximate starting dates after 999]
If sorted character by character, these subheadings would have been in the order:
Europe
    1000-1500. Historiology - Essays
    [other subheadings with exact starting dates after 999]
    300-1500
    300-1500. Historiology - Essays
    300-1900 - Readings
    300-1973
    400-1000 - Essays
    476-911
    500-1300 - For Irish students - Secondary school texts
    742-814
    768-1453
    800-1150
    B.C. 500-A.D. 1973
    ca 1-ca 1600
    [subheadings with approximate starting dates after 999]
    ca 300-ca 1450 - Serials
    ca 400-1066
    to 1935
    to 1976 - Festschriften
    to ca 1200
{ 154} For such sequences of subheadings, chronological order is both more rapidly predictable and more collocative of similar items than an alphabetical arrangement.

How string indexing software accomplishes sorting into chronological order depends in part on the number of different date ranges allowed. PRECIS allows a very large number of different date ranges. Thus, the PRECIS software has to have a general routine for translating a date into a special form for sorting. MLA CIFT, on the other hand, has a limited number of different date ranges, all listed in its thesaurus. The CIFT software therefore simply looks up each date range in its thesaurus, where it finds the corresponding "sort form" (Modern Language Association 1982, p. 26).

7.2.5 General-before-special order

A general-before-special order is one in which an item on a broader topic precedes all items on specific topics included in that broader topic. Such orders are much more common in classified arrangements of books on shelves than in string indexes, but some examples in string indexing can be noted. For instance, MLA CIFT's ranking of date ranges places entries relating to broader chronological periods before those relating to narrower ones with the same starting date; for example, "1700-1899" before "1700-1799". The reverse pattern is found in PRECIS.
     The general-before-special principle accounts for wanting the period to rank before the space, rather than after it as it does in ASCII.  For example, an item on "the political aspects of languages" is more general than one on "the political aspects of languages in Somalia". Typical index strings starting with "Languages" for these two items are:
LANGUAGES. POLITICAL ASPECTS

LANGUAGES in SOMALIA. POLITICAL ASPECTS
If the period is ranked after the space, the order of these strings is special before general:
LANGUAGES in SOMALIA. POLITICAL ASPECTS

LANGUAGES. POLITICAL ASPECTS

CIFT's "facet codes" are designed to produce a quite extensive general-before-special order; they are, however, not used for this purpose in the MLA CIFT index, but only in the classified listing of indexed items. { 155}

7.2.6 Other variations in sorting

Voress (Voress 1965) reports another sorting variation in a version of KWIC. Here plural forms always immediately follow the corresponding singular forms, regardless of strict alphabetical order. For example, instead of the alphabetical sequence
RAT
RATIO
RATIONALIZATION
RATIOS
RATS
headings would appear in the order
RAT
RATS
RATIO
RATIOS
RATIONALIZATION
Since plurals normally represent quite close ideas to their corresponding singulars, the effect is to improve collocation; there is, of course, some risk to predictability unless searchers are clearly aware of the regrouping.

In English, the pooling of singulars and plurals in sorting can be obtained usually by treating final "S" in a word as a character to be ignored. This method is adopted in the sorting for the KWIC-like index to An Annotated Bibliography on Technical Writing, Editing, Graphics, and Publishing 1950-1965 (Feinberg 1972, p. 141).

A more obvious departure from alphabetical or chronological order is seen in a version of KWOC used to index dissertations at the Catholic University of America (Feinberg 1973, pp. 113-117). Here, entries beginning with the same lead term are subarranged by the horizontal position at which the lead term is repeated in the subheading display. For example, index strings beginning with "COMIC" appear in the order:

COMIC
    UNITY IN THE COMIC FORM.

    THE NATURE OF THE COMIC ACTION.

    AN ANALYSIS OF THE COMIC ELEMENTS IN TWELFTH NIGHT.

    A COMPARATIVE ANALYSIS OF THE COMIC ELEMENTS AS FOUND IN A SHAKESPEAREAN COMEDY, AS YOU LIKE IT AND TRAGEDY, HAMLET.
The purpose is evidently to increase eliminability: searchers can not only { 156} readily scan the beginnings of subheadings by looking straight down the lefthand side of the display; they also have the option of quickly scanning the immediate contexts in which the lead term appears in the titles by looking diagonally from upper left to lower right. Collocation does not suffer in comparison to basic KWOC, because subarrangement there is by the identification numbers of the indexed items and so has little or no collocative value.

7.3 FORMATING OF DISPLAY

Index strings typically consist mostly of text, and formating of text to make it easier to read and understand can be classified broadly into layout and typography. Layout is how the parts of the text are positioned on the page or screen; typography, the style of print used. For improving the effectiveness of searches in index displays, research (Spencer and others 1974, 1975) suggests that: 1. layout is more important than typography when searchers are looking for lead terms; 2. typography may be more useful to searchers trying to match parts of index elements other than the lead term.  For influencing searchers' preferences among different forms of index display, layout seems to be the more important aspect (Hartley and others 1979).

7.3.1 Typography

Common examples of the use of typographical variation in ordinary text are: chapter headings printed in larger typefaces; footnotes printed in smaller typefaces; book titles printed in italics; and key terms underlined or printed in boldface characters.

Some early KWIC indexes employed overstruck characters to emphasize parts of the display, and mixtures of upper and lower case were also introduced (Fischer 1966). Modern equipment has many possibilities for typographical variation. Underlining and highlighted or boldface display capabilities are common. "Reverse-field" and various color combinations are available for screen displays. Computer-controlled typesetting may allow a large number of different type styles.

Since searchers first search an index display for suitable lead terms, the lead terms should stand out from the rest of the index display. The PRECIS software displays these terms in boldface type. NETPAD software can display them in reverse-field (black-on-white instead of white-on-black). CIFT makes them stand out by printing them in capital letters, while displaying most of { 157} the rest of the index strings in lower case. Research shows that words in all capitals are easier to search for but more difficult to read than words in lower case; i.e., more "recognizable" but less "legible" (Foster and Bruce 1982).

Displaying connectives differently from terms may also help searchers. For example, for some searches links between terms may not be important; displaying connectives in a different way then makes it easier for the searcher to concentrate on just the terms. LIPHIS usually distinguishes connectives from terms by starting each term with a capital letter, in the same way as ordinary text often does in titles of articles and books; NEPHIS can use the same technique. NETPAD displays terms in all upper case and connectives in all lower case. A similar effect is achieved in one form of KWIC index by displaying stopwords in lower case and all other words in upper case (Feinburg 1973, p. 73). With the same aim, but without using typographical variation, POPSI can distinguish prepositions by putting them in parentheses.

If the connectives are mostly punctuation marks or other special symbols, they may not need to be made more distinct than they are. PRECIS, for example, does not use typographical variation for most connectives; instead, especially near the beginnings of index strings, it prefers periods (and colons, commas, and ampersands) to words like prepositions to connect terms.

Typographical variation can also distinguish long segments of an index string from short segments. This kind of distinction can help the searcher to grasp the meaning of a complex description more quickly. One method is for connectives between relatively large segments to be placed in italics, while other connectives remain in ordinary type. Another method is to display different major segments of the index string in different typefaces. A single PRECIS index string illustrates both techniques:

Abstracts
    Titles + abstracts or titles only. Keywords. Retrieval performance - Comparative studies
The PRECIS indexer achieves the italicizing of "or" here by means of a special set of codes originally designed for proper names. The italicizing of "Comparative studies" is more automatic. An automatic shift from ordinary type to italics occurs where a PRECIS index string stops describing the subject of an item and starts describing other aspects, like its physical form or the author's approach.

Typographical variation can substitute to some extent for word order in emphasizing certain portions of the index string which are especially important for eliminability. For example, both PANDEX and CIFT use capitalization { 158} to show the repetition of the lead term in the subheading. The searcher thus has the option of skipping readily to the context of the lead term without reading the subheading from the beginning.

7.3.2 Layout

Instead of, or as well as, being displayed in boldface or capitals or the like, lead terms can be placed where they stand out with some blank space around them. Typically, this is done by indenting other parts of the index display, so that the lead terms project on the lefthand side. Many index display systems emphasize lead terms in this way. The lead terms may also be placed in a separate column from the rest of the display; some KWOC indexes take this approach.

A frequent method of increasing the efficiency of index-display searching is to avoid repeating identical parts of adjacent index elements. A common way of achieving this involves dividing each index string into a heading and a subheading. When two or more adjacent index strings have the same heading, the heading can be displayed once for all of them, at the beginning, followed by the different subheadings. It seems that searchers can hold the ideas represented by the heading in mind while scanning a subheading. They can then read the next subheading in the appopriate context. A heading may appear on the same line as its first subheading; but more usually it occupies a line, or lines, of its own.

When many subheadings under a single heading all begin in the same way, each subheading can itself be divided, into a subheading and a subsubheading. This further division is, however, rare in string indexing systems.

A simple rule for dividing an index string into a heading and a subheading is to make the lead term into the heading and the rest into the subheading. ASI, Relational Indexing, KWPSI, and LIPHIS, as well as simple systems like KWOC, all take this approach. CIFT often does so, but sometimes adds a connective expression like "IN LITERATURE" after the first term to complete the heading.

PRECIS allows very long headings. For example, in the index string

Acquisition. Books. Stock. Libraries. Universities. United States
    Selection. Approval plans - Reports, surveys
everything before "Selection" is the heading. Long headings like this are not likely to be shared by more than one index element, and the main purpose { 159} of distinguishing headings from subheadings seems to be thwarted. Even when more than 100 index elements begin with "Acquisition", a PRECIS index display will repeat this term each time the other parts of the heading are different. By contrast, in a system in which the lead term alone always forms the heading, the lead term "Acquisition" could be displayed once for a large number of index elements.

PRECIS' long headings may also create retention problems, to the extent that longer headings represent more complex ideas which are more difficult to remember. Moreover, if the searcher forgets the idea represented by the heading and must reread the heading, this rereading process will be longer the longer the heading is. Overrun of the heading onto a second line can also cause loss of clarity: the searcher tends to misinterpret the overrun as a subheading (Keen 1977c).

Strict adherence to single-term headings may be seen as too extreme. A single term may be ambiguous - e.g., "DOCKS" meaning both weeds and places for ships to tie up in PRECIS - or it may simply be shared by too many index strings. Thus, NETPAD allows searchers to control the length of the headings by adjusting the "subheading threshold". Likewise, the Double-KWIC index producer can select longer or shorter headings from the list presented by the software.

Locators also need special treatment: they are the last parts examined if the index entries appear relevent and the parts always ignored if the index entries appear irrelevant. Typically, a locator is placed either on a separate line at the end of the entry or off to the right, or both.

Online index displays may be simplified by omitting the locators altogether. This is possible because the computer system stores the locators anyway and can theoretically retrieve them in response to a further command. Indeed, if the computer system can use the locators to retrieve the indexed items themselves, searchers may never need to see the locators at all. If locators are omitted from an online index display, care needs to be taken not to blur the boundaries between index elements. Moreover, redundant entries, which would be less easily recognized by searchers without their locators, should perhaps be eliminated in some way.

Where terms are restricted in length, their beginnings can be emphasized by arranging the terms in separate columns. This approach was taken in TABLEDEX and SLIC, but appears generally to lead to the usual problems of "white space".

The effects of layout may be enhanced by the use of contrasting grounds. For example, KWIC indexes to Biological Abstracts and Current State Legislation deemphasize the parts of the titles preceding the lead terms by displaying them on a darker background (Fischer 1966). { 160}

7.3.3 Illustration of formating

To show how a string index can be formated to make a more searchable display, here are two sample displays of the same part of a string index. The first has been formated into paragraphs, one paragraph for each index element. The second has been formated to make a more searchable display. The same NEPHIS input strings have been used in producing both displays.
Minimally formated display
Bibliographic networks. Computerized -. Databases accessible in Canada * 3 110-122
Bibliographic networks. Computerized - for publications of governments * see also CODOC
Bibliographic systems. Computerized -. Role of intermediaries * 3 123-147
Bibliographic systems. Computerized -. Use in reference work * 1 15-29
Bibliographic systems. Online - & sovereignty of Canada. Relationships * 3 80-100
Bibliographic systems. Online -. Query negotiation. Model * 4 86-98
Bishop. Olga -. Publications of the Government of Ontario 1867-1900 * 2 139-140
Books. Demand. Data. Use in distribution of books among branches of library systems * 1 65-68
Books in French Language relating to history of Canada cited in National Union Catalog Pre-1956 Imprints. Bibliographic control. Role of WHSTC project * 5 115-123
Branches of library systems. Distribution of Books. Use of data on demand for books * 1 65-68
Highly formated display
BIBLIOGRAPHIC
  Networks
    . Computerized -. Databases accessible in
        Canada
                                       * 3 110-122
    . Computerized - for publications of
        governments
                                see also CODOC
  Systems
    . Computerized -. Role of intermediaries
                                       * 3 123-147
    . Computerized -. Use in reference work
                                        * 1  15-29
    . Online - & sovereignty of Canada.
        Relationships
                                       * 3  80-100
    . Online -. Query negotiation. Model
                                        * 4  86-98
BISHOP
  . Olga -
    . Publications of the Government of
        Ontario 1867-1900
                                       * 2 139-140
BOOKS
  . Demand
    . Data. Use in distribution of books among
         branches of library systems
                                        * 1  65-68
  in French Language
    relating to history of Canada cited in
        National Union Catalog Pre-1956
        Imprints. Bibliographic control. Role
        of WHSTC Project
                                       * 5 115-123
BRANCHES
  of library systems
    . Distribution of books. Use of data on
        demand for books
                                        * 1  65-68
In the second display, a number of kinds of formating can be noted: the index elements are divided into headings, subheadings, and subsubheadings; both headings and subheadings are in boldface type; the headings are in capitals; subheadings and subsubheadings are indented; connectives and non-access terms marking major segments are underlined; locators are on separate lines and off to the right; shared headings and subheadings are not repeated; and line lengths have been shortened.

7.3.4 Multi-page and multi-screen displays

The discussion so far has taken no account of the special needs of displays which the searcher cannot view all at once. In practice, some printed indexes run to hundreds of pages in length; even a fairly specific query to an online { 161} system could lead to a series of index elements which would require several screens. To search multi-page or multi-screen index displays efficiently, searchers need some additional aids.

The divisions between pages or screens must not complicate searching.  A good general rule, though one which might be broken for special purposes, is not to display part of an index element on one page or screen and part on another. One of the important things this rule implies is about index elements which share the same heading but are displayed on different pages { 162} or screens: each page or screen should repeat the shared heading. Otherwise, searchers may guess an unseen heading, with the possibility of error; or they may check another page or screen and so lose time.

Searchers often need to know if other index elements which fit their needs are displayed on pages or screens before or after the ones at which they are looking. They can find out by looking at these other pages or screens, but helpful messages will save them this effort. If an adjacent page or screen displays index elements which share the same heading as one or more on the current page or screen, the message "[continued]" can appear.

Index display pages or screens are not exactly the same as pages of a more ordinary kind of text, like that of a textbook. For instance, they do not need "headlines" at the top of each page, and separate from the text itself, to help in finding the right passages: the heading of the first index element displayed at the top of each page or screen is usually enough.

Chapter 7 Summary

Once index entries have been generated, three further processes go into producing an index display: 1. enriching with cross-references and other index elements; 2. sorting the index elements; and 3. formating the display.

Cross-references generally replace terms not needed for detail in index strings. They reduce the bulk of the index but may also slow searchers by requiring them to take detours in their searches. Cross-references to or from multiterm expressions, in comparison with those between single terms, have arguable advantages in eliminability, collocation, and clarity, but take longer to read and lead to a bulkier index. Several string indexing systems also allow the generation of permuted cross-references.

Sorting should produce a filing order which activates good index string qualities such as collocation and predictability. Simple character-by-character sorting has weaknesses. General-purpose sorting software provides some additional useful features, such as treating upper and lower case as equivalent. Ignoring of some parts of index elements in sorting may facilitate searching, but is sometimes carried to undesirable extremes. Chronological order is sometimes more useful than a relatively strict alphabetical arrangement. Sorting and index element form may also be designed to produce a general-before-special order. Other variations are rare.

Formating of index displays can mostly be divided into typography and layout. Aspects of index entries to be emphasized by typography and layout include: lead terms and locators versus other parts; terms versus connectives; and long versus short segments. Display of a heading once for a number of index elements decreases bulk and usually makes for more efficient searching. Layout for multipage or multiscreen displays requires attention to the divisions between pages or screens.

<-- Chapter 6: Other Aspects of Index Strings Contents Chapter 8: Selection and Evaluation -->