{ 121}

CHAPTER 6
OTHER ASPECTS OF INDEX STRINGS

A single introductory book cannot deal in detail with all of the finer points of index string generation in the many and varied string indexing systems that have been designed. This chapter, however, will consider some aspects not yet touched upon, under three main headings: term omission; term repetition; and the treatment of parallel parts of descriptions.

6.1 OMISSION OF TERMS

The same terms do not always appear in all the index strings from a given input string. Terms are sometimes omitted selectively from index strings in which they are not essential. The main justification for omitting terms from certain index strings is that index bulk is thereby reduced, with the presumable result of better efficiency in searching. Another important reason is that software often generates and sorts short index entries containing few terms more quickly than longer index entries. Short index strings are also sometimes clearer than longer descriptions.

The omission of terms from certain index strings may or may not cause a significant loss of detail. This section will deal first with cases where significant detail is sacrificed and then with cases in which detail is not sacrificed.

6.1.1 Sacrifice of detail

Against the justifications for term omission noted above, omission of terms needed for detail has a substantial negative side. Loss of detail increases the chances that searchers will retrieve relatively useless indexed items or reject { 122} relatively useful ones, because they must guess at the missing information. The burden of lack of detail is especially heavy when many items all have the same insufficiently detailed index string. Sharp's sample SLIC index, for instance, has 22 undifferentiated locators with the index string "KNITTING: YARNS" (Sharp 1966). A searcher interested in knowing which specific aspects of knitting and yarns the indexed items discuss has little choice but to retrieve all 22 items.

Many KWIC indexes sacrifice detail by crude truncation of index strings. PERMUTERM is less crude in appearance, but allows even less detail by including only two terms in each index string.

In TABLEDEX and SLIC, the amount of detail sacrificed from the input string varies in very systematic ways from one index string to another. In TABLEDEX, for example, the different index strings from the same input string can be arranged to represent a chain of classes, like those in chain indexing, with each class including all that follow it.

The designers of TABLEDEX and SLIC justify both the sacrificing of detail and the patterns in which detail is sacrificed on the basis of a view of searching that should no longer be applicable to string indexing. In this view, searchers are assumed to be familiar with the vocabulary of the index and to search with particular fixed combinations of all relevant terms in mind, as in a Boolean query system. A TABLEDEX searcher is expected to look for index strings beginning with the least common term in the set of relevant terms. There is no point in having less common terms later in the index string: if they are relevant to the searcher's choice, the searcher, anticipating this, should look under them rather than under the more common term. Thus, a searcher looking under a common term is presumed to be searching for a very general class of indexed items and not to mind the lack of detail in the index entries. Similar presumptions apply in SLIC, except that "first in alphabetical order" is substituted for "least common".

Different amounts and kinds of detail are important to different groups of searchers and different information needs. Both KWPSI and NETPAD take some account of this kind of variation by permitting some control of the amount of detail in the index strings by people other than the indexer.

In KWPSI, the index producer can theoretically control the length of index strings by supplying a value for the minimum number of access terms that must be included following the lead term. The KWPSI minimum value in fact seems to have little effect, because other factors, such as requirements for completeness of phrases, dominate what is included in the index strings.

In NETPAD, the searcher can exclude terms which are linked into the description in certain ways by redefining the weights of different types of link and by adjusting the cutoff threshold. When the cutoff threshold is { 123} increased, what tends to be lost are terms which do not qualify earlier terms. Take, for example, an indexed item on "the detection of benomyl residues in bananas", analyzed as

# Term
1 DETECTION
2 BENOMYL RESIDUES
3 BANANAS
# Linktype #
1 O 2
2 I 3
With the cutoff threshold set low, possible index strings are:
  1. BANANAS . BENOMYL RESIDUES . DETECTION
  2. BENOMYL RESIDUES in BANANAS . DETECTION
  3. DETECTION of BENOMYL RESIDUES in BANANAS
Raising the threshold may then result in the index strings:
  1. BANANAS
  2. BENOMYL RESIDUES in BANANAS
  3. DETECTION of BENOMYL RESIDUES in BANANAS
This sort of pattern of term loss is brought about mostly by two facts: 1. the backward weights assigned to linktypes are generally lower than the forward weights; and 2. forward is the usual direction for qualification.

6.1.2 Terms omitted without loss of detail

A term, or terms, can be omitted from an index string without loss of detail if the ideas which it represents are implied unambiguously by what remains. Such terms may be divided roughly into those omitted because they occur in equivalent parts of the input string and those omitted for other reasons. For two or more parts of a description to be equivalent, detail must be lost if all of them are omitted but not as long as at least one of them is retained.

Equivalent parts can be subdivided into alternates and substitutes. The distinction made here is that alternates appear in the same position in otherwise identical index strings from the same input string, whereas substitutes are designed to express the same information at different positions. In other words, alternates are different ways of saying the same thing in the same context, while substitutes are different ways of saying the same thing to suit different contexts. { 124}

6.1.2.1 Alternates
As an example of alternates, the same item might be described as being on "the physical distribution of goods in Eastern Europe" or as being on "the physical distribution of merchandise in Eastern Europe". Either description is sufficiently detailed: there is no need to say both "goods" and "merchandise". But one term or the other must be chosen, because "physical distribution in Eastern Europe" is insufficiently specific. Both "goods" and "merchandise" are terms with which searchers who want this item might start. Thus, either might be the lead term of an index string for the indexed item.

OLPI makes the designation of lead-term alternates especially easy. The indexer needs only to type a "C" into one of the fields on the main screen display for selecting access terms; a second screen display then prompts for alternate lead terms for the access term just marked. For example, the indexer can type in the phrase

Physical distribution of goods in Eastern Europe
and then, after marking "goods" as an access term, specify "merchandise" as a alternate for this term. The result, if "physical distribution" and "Eastern Europe" are also access terms, is index strings such as:
  1. Eastern Europe, physical distribution of goods in
  2. Goods, in Eastern Europe, physical distribution of
  3. Merchandise, in Eastern Europe, physical distribution of
  4. Physical distribution, of goods, in Eastern Europe
CASIN is also well adapted to the designation of alternate lead terms, though their number per input string is limited by the number of "categories" available. In NEPHIS and LIPHIS, coding alternate lead terms is rather tricky, but nevertheless possible.

Alternates are not usually desirable except starting near the beginning of the index string. Their main purpose is to improve recall and search efficiency by avoiding difficulties with predicting which term the index uses to represent an idea, and predictability is mostly desirable in the early part of an index string. Moreover, alternates beginning later in the index string lead to the "stuttering" effect caused by two index entries for the same indexed item appearing too close together. The PRECIS "theme interlink" codes in column { 125} 2 theoretically permit alternates at any position desired, but in practice alternates begin very close to the start of the index string. In the Iowa State University system it is normal for an alternate access term to function as an alternate everywhere in the index string. But the index strings are generally short; thus, again no alternate begins far from the start of the index string.

6.1.2.2 Substitutes
The way searchers use the early part of an index string differs from the way they use the later parts. Thus, it is sometimes good for different terms to be used to convey the same information in different positions in an index string. An illustration can be seen in Double-KWIC. Here, the software recognizes singular and plural forms and creates a common form of lead term for both; for instance, "COST(S)" for titles containing either "COST" or "COSTS" (Petrarcha and Lay 1969b; Lay 1973, 83-89). Meanwhile, the original singular or plural form remains in other positions in the index string. The common singular-plural forms for the lead terms improves predictability and collocation for the earlier stages of searches; meanwhile, the retention of the specific singular or plural in the context of the subheading improves clarity for the final stage.  More radical automatic substitution than this is found in PANDEX, where ancillary input can serve to substitute completely different terms in the lead position. Input strings may actually be written in different languages. For example, suppose that PANDEX's ancillary input includes English translations for selected French terms, such as
traduction --> translation
Then a French-language input string such as
Un modèle sémiotique de la traduction
can lead to index strings with appropriate English terms in the lead position but with the original French term in context in the subheading; for example, assuming that French articles and prepositions are recognized as stopwords,
TRANSLATION
    Un modèle SEMIOTIQUE de la TRADUCTION

PRECIS, LIPHIS, and NEPHIS all allow substitution for terms when they qualify earlier terms in the index string.  Moreover, PRECIS and LIPHIS allow substitution of a phrase for more than one term at once. For example, column 5 in a PRECIS input string is normally a "0"; if it is some other numeral, the PRECIS software recognizes the segment as a substitute for some adjacent segments. The numeral tells how many segments of the input string will be replaced by the substitute. The numeric code in column 6 { 126} indicates in which direction the replaced segments lie in the input string: "2", if before the substitute; "1", if after. Take, for instance, the PRECIS input string for an indexed item on "flow of air through aircraft propellers":

$z11030$aaircraft
$zp1030$apropellers
$z10220$aaircraft propellers
$zs1030$aflow$vof$wthrough
$z31030$aair
The first "2" in "$zp0220$aaircraft propellers" marks the phrase "aircraft propellers" as a substitute for two segments. The second "2" shows that the segments replaced are those containing the terms "aircraft" and "propellers", and not those containing "air" and "flow".  The total set of index strings is thus:
  1. Air
        Flow through aircraft propellers
  2. Aircraft
        Propellers. Flow of air
  3. Flow. Air
        Through aircraft propellers
  4. Propellers. Aircraft
        Flow of air
The aim of the substitution here is to avoid less desirable wording such as "propellers of aircraft" in the entries that start with "Air" and "Flow". The underlying structure may be seen as something like
*FLOW---through---*PROPELLERS---of---*AIRCRAFT
|  |               |
|  |               i.e.
|  |               |
|   ---through---AIRCRAFT PROPELLERS
of
|
*AIR
The index string generator may be seen as following the top "through" link only backwards, never forwards; that is, never for qualification. For qualification, it is the bottom "through" link that is followed.

Substitutions may be especially required in languages where nouns have a case structure. As an example, consider an article on "use of NAD for the determination of glucose in beverages". In English, a NEPHIS indexer assigns the input string

@USE? of <NAD>? for <@DETERMINATION? of <GLUCOSE? in <BEVERAGES>>>
with the index strings: { 127}
  1. BEVERAGES. GLUCOSE. DETERMINATION. USE of NAD
  2. GLUCOSE in BEVERAGES. DETERMINATION. USE of NAD
  3. NAD. USE for DETERMINATION of GLUCOSE in BEVERAGES
In German, on the other hand, the indexer needs to take account of the different cases of the word for "beverages": the dative case "GETRÄNKEN" is required after the preposition "IN"; in contrast, the nominative "GETRÄNKE" is required where no preposition precedes. Thus, the input string is
@GEBRAUCH? von <NAD>? zur <@BESTIMMUNG? von <GLUKOSE? in GETRÄNKEN <?GETRÄNKE. >>>
with the index strings:
  1. GETRÄNKE. GLUKOSE. BESTIMMUNG. GEBRAUCH von NAD
  2. GLUKOSE in GETRÄNKEN. BESTIMMUNG. GEBRAUCH von NAD
  3. NAD. GEBRAUCH zur BESTIMMUNG von GLUKOSE in GETRÄNKEN

Because PRECIS sometimes uses periods and sometimes prepositions to represent a link followed for qualification, it has had to introduce the special additional codes "$s" and "$t" for case substitution (Sørensen and Austin 1976b).

6.1.2.3 Lead-only terms
A term omitted both without loss of detail and without regard to equivalent parts is usually a lead-only term; that is, a term included only as a lead term. For example, suppose that an indexed item on "the use of microcomputers in education" has the index strings:
  1. COMPUTERS. MICROCOMPUTERS. USE in EDUCATION
  2. EDUCATION. USE of MICROCOMPUTERS
  3. MICROCOMPUTERS.  USE in EDUCATION
Here, "COMPUTERS" in index string 1 contributes nothing to detail, since searchers presumably know that microcomputers are computers. The purpose of including it is to increase search efficiency and effectiveness by providing additional access to the indexed item for searchers looking under "COMPUTERS". For this purpose, it is needed only as a lead term; hence, it is omitted from the other two index strings. { 128}

A lead term which contributes nothing to detail is frequently a generic term for another term in the index string; that is, it represents a broader class of things, which includes the things represented by the other term. Other relationships may also occur; for example, that between a geographical area and a smaller geographical area within it. Cross-references are a very common alternative to the use of lead-only terms. Where the approach of lead-only terms is employed, the result is longer index displays, but with a reduction in the number of times a searcher has to look up a new expression to continue a search.

Features allowing indexers to specify lead-only terms in input strings are found in PRECIS, CASIN, NEPHIS, POPSI, LIPHIS, and some versions of KWOC. For instance, the character in column 6 in a PRECIS input string helps control whether the terms in the segment will be included in all or only in some of the index strings. Usually, this character is a "3", meaning that the terms will be included regardless by what route the PRECIS index string generator arrives at them when producing an index string. If the character is a "0", however, the terms in the segment will not be included unless the index string begins with one of them. As a simple example, in the PRECIS input string

$z11000$aUnited States
$zp1030$aColorado River
$z60030$aviews from space
the terms "Colorado River" and "views from space" are marked with a "3" in column 6, but "United States" is marked with a "0". Thus "United States" appears only as a lead term in the corresponding index strings:
  1. Colorado River
        - Views from space
  2. United States
        Colorado River - Views from space

POPSI places so much emphasis on the inclusion in the input string of lead-only terms that cross-references from generic to specific terms are actually excluded from its indexes. As an example of the frequency of lead-only terms in POPSI input strings, take the input string applied to an item on "decomposition of bacterial cell walls by antibiotics":

MICROBIOLOGY,BACTERIA,CELL,WALL: DECOMPOSITION-CHEMICAL>BIOSUBSTANCE>DRUG >ANTIBIOTIC
The ">" code follows three lead-only terms - "DRUG", "BIOSUBSTANCE", and "CHEMICAL" - all generic to "ANTIBIOTIC". { 129}

Lead-only terms need not be supplied with the input string, but can be derived automatically from a thesaurus available to the index string generator. Thus, while NETPAD indexers could specify lead-only terms in input strings by appropriate use of linktype weights and the cutoff threshold, they need not do so; it is simply necessary for thesaural information about terms to be entered also into the NETPAD database.  The NETPAD user enters the thesaural information in a form almost identical with ordinary input strings; the key difference is that at least one term is identified, by means of a final exclamation mark, as referring to all occurrences of the thing of class of things named. For example, a user might input the general information that the Colorado River is a river in the United States:

# Term
1 COLORADO RIVER !
2 RIVER
3 UNITED STATES
# Linktype #
1 = 2
2 < 3
The exclamation mark at the end of the term "COLORADO RIVER" tells the software that this information can be applied whenever the term "COLORADO RIVER" turns up in the database. The NETPAD index string generator makes additional index strings by combining the thesaural information with the input strings for specific indexed items. For example, if an indexer also indexes a collection of views of the Colorado River from space -
# Term
1 VIEWS FROM SPACE
2 COLORADO RIVER
# Linktype #
1 ( 2
- a searcher asking for "UNITED STATES" sees an index string like
UNITED STATES. RIVER: COLORADO RIVER. VIEWS FROM SPACE
6.1.2.4 Other terms omitted without loss of detail
Sometimes a term which is not useful for detail is useful not only as a lead term but also to improve collocation or eliminability in other index strings. The appropriate effect can be achieved in PRECIS by means of the "1" ("not up") and "2" ("not down") codes in column 6. An example may be seen in the input string { 130}
$z11030$abuildings
$zp1010$aframes$21timber
$zp1030$arafters
$z20030$aconstruction
The "1", rather than the normal "3", in column 6 in the second line means that "frames" and "timber" will be omitted from the index string which puts "rafters" first but not from the index string which puts "buildings" first.  Thus, the index strings are:
  1. Buildings
        Timber frames. Rafters. Construction
  2. Frames. Buildings
        Timber frames. Rafters. Construction
  3. Rafters. Buildings
        Construction
  4. Timber frames. Buildings
         Rafters. Construction
As a result, searchers looking under the heading "Buildings" find entries relating to timber frames in one uninterrupted sequence and not scattered depending on what specific parts of the frames are mentioned. A NEPHIS indexer would have trouble achieving just this effect; a LIPHIS indexer would not.

Why omit such terms at all? Why not simply include them in all the index strings? For a glimpse of the kind of difficulties that could result, consider the theoretical "basic" version of POPSI. Here, the entire input string, with all its generic terms, is repeated in every index string. For example, a document on "chemotherapy of adenocarcinoma of the stomach" has the input string

Medicine, Digestive system > Stomach, Disease > Cancer > Carcinoma > Adenocarcinoma, Treatment > Chemotherapy
and a searcher looking up "Chemotherapy" encounters the index string
Chemotherapy
    Medicine, Digestive system > Stomach, Disease > Cancer > Carcinoma > Adenocarcinoma, Treatment > Chemotherapy
Indeed, no matter from which direction searchers approach the document, they are always led down the complete generic-to-specific paths: "Digestive system" --> "Stomach"; "Disease" --> "Cancer" --> "Carcinoma" --> "Adenocarcinoma"; and "Treatment" --> "Chemotherapy" (Bhattacharyya 1979). Following such paths seems overly time-consuming and the greater index bulk produced by the longer entries a hindrance to searchers. Moreover, one experiment (Raghavan and Iyer 1978) actually suggests that the added information may make the index strings harder to understand. { 131}

6.2 REPETITION

The opposite of omission is repetition, and this section will consider the repetition of parts of the input string in more than one place in the same index string. The usual purpose of repetition is to increase clarity, while its obvious drawbacks are redundancy and increased index bulk. KWOC and other KWOC-like string indexing systems such as PANDEX, POPSI, and CIFT make a rule of repeating the lead term later in the index string. What will be considered below, however, are certain conditions which call for repetition even in some systems that normally avoid it.

6.2.1 Repetition after adjectives

In English, as in a number of other languages, adjectives mostly precede the nouns that they modify. Indeed, preceding may be the key indication that a word is a modifier rather than being modified - "blind Venetian" and "Venetian blind" being a classic example for information retrieval theorists.

A noun modified by an adjective must sometimes precede the adjective in an index string, usually because the noun is the lead term. For such index strings, systems like ASI, KWPSI, NEPHIS, and Relational Indexing simply allow the adjective to stand after the noun. An example is provided by Relational Indexing's treatment of an item on "the relationship of research productivity and informal communication among sleep researchers". The input string for this item is

v=1;s=sleep
v=1;s=researchers
v=1;s=communication [informal]
v=1;s=productivity
v=1;s=research
l=1;w=1;r=7;p=on;w=2
w=2;r=7;l=1;w=3
w=3;p=related to;r=7;w=4
w=5;r=5;p=of;w=4
g=1;w=2;r=9;p=by;l=1;w=5
The noun "communication" appears only once in each of the three resulting index strings, even though in one it must precede the adjective that modifies it:
  1. Communication
        of researchers on sleep.  Related to productivity of research. Informal -,
  2. Research
        by researchers on sleep. Informal communication related to productivity of - { 132}
  3. Sleep
        researchers informal communication related to productivity of research

As pointed out by Michell (Michell 1979b), inverted noun-adjective constructions are found also in ordinary English. It may be argued, however, that they slow or hinder comprehension because they are less familiar to searchers. Moreover, especial difficulties may arise when the adjective is considerably separated from its noun, as in index string 1 above.

PRECIS and LIPHIS, weighing the disadvantages of redundancy relatively lightly against the advantage of clarity, prefer to repeat the modified noun. Thus, even though "communication" appears only once in the PRECIS input string

$z21030$asleep
$z20030$aresearch$won
$z20020$a
$z31030$aresearch workers
$z21030$acommunication$21informal
$zt0030$arelated to
$z21030$aproductivity
it is repeated in the index string
Communication. Research workers. Research on sleep
    Informal communication related to productivity

6.2.2 Repetition after connectives

In English and other languages, certain connectives such as prepositions tend to be followed by the expression to which they are connecting, and clarity tends to be improved as a result. "Backward-pointing" prepositions are a major feature of ASI and are allowed in NEPHIS and in Relational Indexing.  PRECIS and LIPHIS, however, avoid them by repetition. Take, for example, an article on "media for publication of information on science". The term "science" appears only once in the PRECIS input string:
$z11030$ascience$21information on
$z21030$apublication
$z30030$amedia
In all the resulting index strings, however, "on" is followed by "science", even if "science" must be repeated: { 133}
  1. Information on science
         Publication. Media
  2. Publication. Information on science
        Media
  3. Science
        Information on science.  Publication. Media
In PRECIS, this sort of repetition is fairly rare and is dealt with by stretching the definition of the codes normally used for adjectives ("$21" in the example).

In LIPHIS, term repetition after connectives is somewhat more common and more regularly coded. LIPHIS can, in fact, handle much more complex structures than PRECIS can without extensive duplication of information in the input string. Take, for example, the topic of "assistance to scientists with their problems with foreign languages", which can be represented by the network diagram:

*ASSISTANCE---to---
|                 |
with              |
|                 |
*PROBLEMS---of----*SCIENTISTS
|
with
|
*FOREIGN
|
(type of)
|
*LANGUAGES
In the LIPHIS input string for this topic, the term "Scientists" appears only once:
Assistance 1 to 2 Scientists = 1 with Problems 3 of 2 = 3 with Foreign Languages
In each of the index strings, however, the index string generator inserts the same term twice, once for the "to" link and once for the "of" link:
  1. Assistance
        to Scientists with Problems of Scientists with Foreign Languages
  2. Foreign
        Languages. Problems of Scientists. Assistance to Scientists
  3. Languages
        Foreign Languages.  Problems of Scientists. Assistance to Scientists { 134}
  4. Problems
        of Scientists with Foreign Languages. Assistance to Scientists
  5. Scientists
        Assistance with Problems of Scientists with Foreign Languages

6.3 PARALLEL PARTS

In ordinary language, parallel or coordinate parts of descriptions are usually marked by the presence of a coordinating conjunction such as "and". As an example of a description containing parallel parts, take the topic of "the effects of feldspar, slate, and quartz on the lungs of rats". Here, "FELDSPAR", "SLATE", and "QUARTZ" are parallel, as illustrated by the corresponding network diagram:
EFFECTS----of----*FELDSPAR
| | |             |
| | |             &
| | |             |
| |  ------of----*SLATE
| |               |
| |               &
| |               |
|  --------of----*QUARTZ
|
on
|
*LUNGS----of----*RATS
In a sense, three descriptions are combined in one: 1. "the effects of feldspar on lungs of rats"; 2. "the effects of slate on lungs of rats"; and 3. "the effects of quartz on lungs of rats". In the example, each parallel part contains a single term; but multiterm parallel parts are also possible.

Parallel parts are treated specially by several string indexing systems. Recognition of parallel parts by the index string generator usually requires additional coding. The NETPAD index string generator, however, recognizes parallelism from other clues; namely, two links of the same type either from one term to two other terms or to one term from two other terms. For example, in a NETPAD input string for the sample description above, the same type of link from "EFFECTS" to "FELDSPAR", "SLATE", and { 135} "QUARTZ" is enough to indicate that the latter three terms are parallel:

# Term
1 EFFECTS
2 FELDSPAR
3 SLATE
4 QUARTZ
5 LUNGS
6 RATS
# Linktype #
1 O 2
1 O 3
1 O 4
1 R 5
5 O 6

Special treatment is applied mostly when the lead term falls within a parallel part of the input string. The reason lies in the main aim of special treatment, which is to improve index string predictability through equal, consistent treatment of all the parallel parts: predictability is an especially important quality in the early part of an index string.  Several techniques are applied to ensure consistent treatment of access terms within parallel parts: omission of parallel parts; repetition of the part containing the lead term; interposing of the other parallel parts; and addition of the other parallel parts at the end of the index string.

6.3.1 Omission of parallel parts

The omission technique takes two forms. In the first form, the input string is treated as defining separate "themes", one for each parallel part, and index strings are generated separately for each theme. As a result, more than one of the parallel parts never appears in the same index string.

Devices for the first form of the omission technique are available in PRECIS, CASIN, and the Iowa State University system. Some portions of the input string are defined as common to all themes, or outside of the parallel parts; other portions, as peculiar to specific themes, or inside one of the parallel parts. In a PRECIS input string, the "theme interlink" code in column 2 is "z" if the segment is common to all themes; "x" if it is the first segment peculiar to a theme; and "y" otherwise. For example, a document with the two themes "costs of installation of electric cookers" and "costs of maintenance and repair of gas cookers" is assigned the single PRECIS input string
{ 136}

$x11030$aelectric cookers
$y20030$ainstallation$wof
$x11030$agas cookers
$y20030$amaintenance & repair$wof
$zp1030$acosts
The resulting index strings are:
  1. Costs. Installation of electric cookers
  2. Costs. Maintenance & repair of gas cookers
  3. Electric cookers
        Installation. Costs
  4. Gas cookers
        Maintenance & repair. Costs
That is, the index strings obtained are exactly the same as those from two separate input strings, one for each theme:
$z11030$aelectric cookers
$z20030$ainstallation$wof
$zp1030$acosts
and
$z11030$agas cookers
$z20030$amaintenance & repair$wof
$zp1030$acosts
In CASIN input strings, categories 10 ("title number", or locator), 31 and 32 ("country"), and 41, 42, and 43 ("type") are common to all themes; other categories are theme specific. In the Iowa State University system, theme specific portions of the input string are marked off with parentheses and semicolons.

In the second form of omission, the parallel parts do appear in the same index string provided the lead term is not from any of them. Examples can be seen in PRECIS, NETPAD, and a later version of ASI. In input strings in the later version of ASI (Belton 1972), angular brackets ("<", ">") surround access terms and exclamation marks ("!") enclose segments containing parallel terms; e.g.,

Effect of ! <feldspar>, <slate> & <quartz> ! on <lungs> of <rats>
The ASI index string generator generates ordinary index strings for lead terms outside the segment marked by the exclamation marks, but a lead term inside this segment causes the rest of the segment to be omitted from the index string; e.g., { 137}
  1. Feldspar, effect of, on lungs of rats
  2. Lungs, of rats, effect of feldspar, slate & quartz on
  3. Quartz, effect of, on lungs of rats
  4. Rats, lungs of, effect of feldspar, slate & quartz on
  5. Slate, effect of, on lungs of rats
The PRECIS "g" code in column 3, which marks a segment as parallel to the preceding segment, has a similar effect. For example, the PRECIS input string
$z00030$dGreat Britain
$zp1030$arivers$v&
$zg1030$astreams
$z11030$acoarse fish
$z21030$aangling
$z60030$amanuals
has index strings where "rivers" as a lead term suppresses mention of "streams" and vice versa:
  1. Angling. Coarse fish. Rivers & streams. Great Britain
        - Manuals
  2. Coarse fish. Rivers & streams. Great Britain
         Angling - Manuals
  3. Rivers. Great Britain
        Coarse fish. Angling - Manuals
  4. Streams. Great Britain
        Coarse fish. Angling - Manuals

Both forms of the omission technique perform well on most desirable qualities of index strings, but do lead to some loss of detail. Loss of detail is less of a problem with the second form. Unlike the first form, the second form also avoids the "stuttering" effect of having two index strings such as

Angling. Coarse fish. Rivers.  Great Britain
    - Manuals
and
Angling. Coarse fish. Streams. Great Britain
    - Manuals
both beginning in the same way and both referring to the same indexed item. The remaining three techniques, discussed below, equally avoid the stuttering effect, but preserve detail in all the index strings. { 138}

6.3.2 Repetition of parallel parts

Repetition in context of the parallel part containing the lead term arises naturally in a system like KWOC, where the entire input string is always repeated in the index string; e.g.,
  1. FELDSPAR
        EFFECT OF FELDSPAR, SLATE AND QUARTZ ON LUNGS OF RATS
  2. LUNGS
        EFFECT OF FELDSPAR, SLATE AND QUARTZ ON LUNGS OF RATS
  3. QUARTZ
        EFFECT OF FELDSPAR, SLATE AND QUARTZ ON LUNGS OF RATS
  4. RATS
        EFFECT OF FELDSPAR, SLATE AND QUARTZ ON LUNGS OF RATS
  5. SLATE
        EFFECT OF FELDSPAR, SLATE AND QUARTZ ON LUNGS OF RATS

Since 1974, PRECIS has added the "f" code option in column 3 to produce contextual repetition. For example, by changing "g" to "f" in the input string given above -

$z00030$dGreat Britain
$zp1030$arivers$v&
$zf1030$astreams
$z11030$acoarse fish
$z21030$aangling
$z60030$amanuals
- the index strings are changed to:
  1. Angling. Coarse fish. Rivers & streams. Great Britain
         -Manuals
  2. Coarse fish. Rivers & streams. Great Britain
        Angling - Manuals
  3. Rivers. Great Britain
        Rivers & streams. Coarse fish. Angling - Manuals
  4. Streams. Great Britain
        Rivers & streams.  Coarse fish. Angling - Manuals
The main weakness of this technique is, of course, the loss of succinctness. { 139}

6.3.3 Interposing of parallel parts

Two slightly different versions of the interposing of parallel parts are exemplified in LIPHIS and PASI. The difference may be summed up by saying that, when the lead term is in one of the parallel parts, LIPHIS shunts the parallel parts while PASI cycles them. For example, the LIPHIS input string
@Effects 1 of Feldspar & Quartz & Slate = 1 on Lungs of Rats
produces the index strings:
  1. Feldspar
        & Quartz & Slate.  Effects on Lungs of Rats
  2. Lungs
        of Rats. Effects of Feldspar & Quartz & Slate
  3. Quartz
        & Slate & Feldspar. Effects on Lungs of Rats
  4. Rats
        Lungs. Effects of Feldspar & Quartz & Slate
  5. Slate
        & Quartz & Feldspar. Effects on Lungs of Rats
By comparison, the PASI input string
Rats, Lungs, Feldspar: Quartz: Slate, *Effect
yields the index strings:
  1. Feldspar: Quartz: Slate,
        Effect; Rats, Lungs,
  2. Lungs,
        Feldspar: Quartz: Slate, Effect; Rats
  3. Quartz: Slate: Feldspar,
        Effect; Rats, Lungs,
  4. Rats,
        Lungs, Feldspar: Slate: Quartz, Effect
  5. Slate: Feldspar: Quartz,
        Effect; Rats, Lungs,
The difference in citation order is evident in the index string beginning with "Slate".  Either version is possible in NEPHIS, but the coding is rather clumsy. { 140}

Some arguments can be raised against the interposition technique.  For example, clarity may suffer as the parallel parts become more complex than the single terms of the examples. In any case, the resulting subarrangement of index entries by terms parallel to the lead term may be considered to provide inferior collocation. Take, for instance, searchers looking up the name of a mineral such as feldspar. These searchers may consider that a subarrangement of the index entries by other minerals mentioned in parallel is not particularly useful. Instead, they may prefer subarrangement into "effects", "analysis", "mining", and so on.

6.3.4 Postposing of parallel parts

Addition of the other parallel parts at the end of the index string is seen in Relational Indexing. Take, for example, the Relational Indexing input string corresponding to "definitions of basic terms in computer science and information science":
v=1;s=computer science
v=1;s=information science
v=1;s=terms [basic]
v=1;s=definitions
l=1;w=1;a=12;r=7;p=in;l=1;w=3
l=1;w=2;a=11;r=7;p-in;w=3
w=3;r=7;l=1;w=4 w=3;r=7;w=4
Here, "computer science" and "information science" are parallel terms. In the resulting index string, when one of these parallel terms is the lead term, the other is placed last in the index string; otherwise, both terms are placed together, joined by "and":
  1. Computer science
        terms, basic, definitions. Information science and -,
  2. Definitions
        of terms, basic, in information science and computer science.
  3. Information science
        terms, basic, definitions. Computer science and -,
  4. Terms
        in computer science and information science. Definitions of -, Basic -,
This technique seems to perform well on all qualities except clarity, which is somewhat impaired by the unfamiliar order.

PRECIS does not regularly employ postposing of parallel parts. It does something similar, however, when two parts are connected by an "author { 141} attributed association", marked by the "t" code in column 3. Take, for example, an input string representing a "comparison of Christianity and Judaism in Europe":

$z01030$dEurope
$z11030$aChristianity
$zt0030$acompared to
$z11030$aJudaism
The author of the indexed item is seen as focusing equally on Chistianity and Judaism, and in this sense the terms are parallel in the description. The PRECIS index string generator accordingly produces the index strings:
  1. Christianity. Europe
        compared to Judaism
  2. Europe
        Christianity compared to Judaism
  3. Judaism. Europe
        compared to Christianity

Chapter 6 Summary

Terms are sometimes omitted selectively from index strings in which they are not essential, usually for reasons of efficiency in searching or index production. Such omission may or may not mean a significant loss of detail.

Where detail is sacrificed, search effectiveness often suffers. Nevertheless, different amounts and kinds of detail are important to different groups of serchers and different information needs. Thus, both KWPSI and NETPAD permit some control of term omission by people other than the indexer.

Terms omitted without sacrifice of detail include substitutes, alternates, lead-only terms, and terms used to increase eliminability or collocation.  Alternates are not usually desirable except near the beginning of the index string. Their main purpose is to avoid problems with predicting which terms the index uses to represent an idea. Substitutes take account of the different ways in which searchers use different parts of the index string. They may be especially required in languages where nouns have a case structure. Lead-only terms are often generic to other terms in the descriptions. While usually specified in the input string, they may also be derived from a thesaurus. The "basic" version of POPSI illustrates what happens when terms not needed for detail are not selectively omitted.

Repetition of a term from the input string in more than one place in the same index string is common in some string indexing systems. Even some systems which normally avoid such repetition permit it after adjectives and prepositions. { 142}

Special treatment of parallel parts of descriptions is applied mostly when the lead term falls within one of the parallel parts. Several techniques are applied to ensure consistent treatment of access terms within parallel parts. The omission technique takes two forms depending on the treatment of lead terms outside the parallel parts; in both forms, parallel parts are omitted when the lead term falls within a parallel part. Other techniques are: the repetition of the parallel part in context; the interposing of the other parallel parts; and the postposing of the other parallel parts to the end of the index string.

<-- Chapter 5: The Syntax of Index Strings Contents Chapter 7: Cross-references, Sorting, and Formating -->