Using the graphic editing capability of the system, an indexer defines concept nodes by typing in appropriate terms, which are then displayed in a "staircase" accross the screen. The indexer can also define links between concept nodes; these links are marked by one-character mnemonics, accompanied by slashes and, where the linked nodes are distant, by vertical and horizontal lines. For example, in indexing an item on "ignition of methane in the air in coal mines by sparks", the indexer might build up the display:
IGNITION
|/O
| METHANE
| /I
| AIR
/B |
-- |--SPARKS
/I /I
---COAL MINES
Here, the "/O" link indicates object; the "/I" links,
environment; and the "/B" link, agent.
A different set of link types is defined for each database, and these definitions may be changed at will. Part of the definition of a link type is its one-character mnemonic. Other parts are a pair of "connectives" and a pair of "weights". Connectives are words, phrases, or punctuation marks that the software uses to express a link of that type in the permuted index entries. Weights are values indicating the strength of the link relative to other links and they are used by the software in determining citation order, heading-subheading division, and specificity of the permuted index entries.
In the type of index display originally developed for this system, the user's control over linktype definitions and over threshold values associated with the weights provides many possibilities for variation in the format of index entries. For example, different permuted index entries beginning with "IGNITION" that could all correspond to the graphic display given above include:
On the other hand, all such variations of display format assume a particular kind of query with a particular kind of relationship to the corresponding index entries. Specifically, the searcher is expected to type in a single term or a single string of initial characters, and every permuted index entry displayed in response will begin with that term or that string of characters.
These assumptions about the query and its relationship to the index display seem fairly appropriate as long as the number of index entries, for all descriptions in the database, that begfin with a given term remains relatively small. User effort is saved by the use of simple queries, and the displays presented in response can be scanned quickly.
For prolific terms, however, the number of index entries may become unmanageably large. Suppose, for example, that the database has a coverage similar to that of the British Library BLAISE database (British Library Automated Information Service 1979). An indexer who enters the search term "ACQISITION" may then be faced with many screenfuls of index entries, starting with something like:
ACQUISITION of AGRICULTURAL LAND in GREAT BRITAIN . SURVEYS &
REPORTS
ACQUISITION of ART OBJECTS . ETHICS
ACQUISITION of BOOKS by LIBRARIES . SELECTION
ACQUISITION of BOOKS by LIBRARIES of UNIVERISITIES in UNITED
STATES . SELECTION . APPROVAL PLANS . SURVEYS &
REPORTS
ACQUISITION of BOOKS by PUBLIC LIBRARIES in GREAT BRITAIN .
SELECTION
ACQUISITION of BOOKS by PUBLIC LIBRARIES in GREAT BRITAIN
from BOOKSELLERS . DELAY . SURVEYS & REPORTS
ACQUISITION of BRAZILIAN CITIZENSHIP . LAW
ACQUISITION of BRITISH CITIZENSHIP . STATISTICS
ACQUISITION of CHILDREN'S BOOKS by LIBRARIES in UNITED STATES
SELECTION . READINGS
ACQUISITION of CHILDREN'S STORIES by CHILDREN'S LIBRARIES in
CANADA . EFFECTS relating to PUBLISHING of
CHILDREN'S STORIES
ACQUISITION of COGNITIVE SKILLS . SIMULATION . USE of HACKER
PROGRAM
ACQUISITION of COMPANIES in EUROPEAN COMMUNITY . PRACTICAL
INFORMATION
ACQUISITION of CONCEPTS of DEVELOPMENT of ANIMALS by CHILDREN
ACQUISITION of CONCEPTS of PROBABILITIES & CHANCE by CHILDREN
ACQUISITION of DOCUMENTS relating to SOUTHEAST ASIA by
LIBRARY of AUSTRALIAN NATIONAL UNIVERSITY .
PROPOSALS
To avoid excessively long index displays, as well as for other reasons, searchers may prefer to be able to enter other kinds of search specifications. Search specifications involving Boolean logic are one obvious example, but the use of other kinds of complex specifications may also be desirable: simple lists of search terms; lists of weighted search terms; substructures to be matched against parts of descriptions, as in TOSAR (Fugmann and other 1974) or Relational Indexing (Farradane 1980a, 1980b; Farradane and Thompson 1980); citations to known relevant documents; and so on.
Apart from query-independence and customizability, the new format has one other major characteristics and two major preferences. The major characteristic is that it includes all the terms in the description. Its major preferences are for: 1. a term order in which qualifying terms follow the terms that they qualify; 2. the use of connectives to distinguish the types of links between concepts.
How the format behaves will be shown by way of an example. Suppose that that Boolean query "ACQUISITION AND CHILD*" retrieves descriptions for seven documents:
If the documents have been appropriately indexed, the new format typically gives the following display:
ACQUISITION of CONCEPTS of DEVELOPMENT of ANIMALS by
CHILDREN
ACQUISITION of CONCEPTS of PROBABILITIES & CHANCE by CHILDREN
ACQUISITION of LANGUAGE SKILLS by CHILDREN
& BABIES
EFFECTS of ACQUISITION of CHILDREN'S STORIES by CHILDREN'S
LIBRARIES in CANADA relating to PUBLISHING OF
CHILDREN'S STORIES
EFFECTS of BILINGUALISM of CHILDREN relating to ACQUISITION
of LANGUAGE SKILLS
READINGS relating to SELECTION of CHILDREN'S BOOKS by
LIBRARIES in UNITED STATES relating to ACQUISITION
Before proceeding further, a couple of points should be noted about this display. First, the entries "ACQUISTION of LANGUAGE SKILLS by CHILDREN" and "ACQUISITION of LANGUAGE SKILLS by CHILDREN & BABIES" are grouped together under a common heading "ACQUISITION of LANGAUGE SKILLS by CHILDREN"; this grouping can be eliminated if desired by lowering the "subheading" threshold value. Second, the number of linktypes has been limited by the database designer, with the "relating to" linktype being used as a sort of catchall; hence, the rather stilted expressions "relating to PUBLISHING" and "relating to ACQUISITION".
Each of the entries is independent of the query. For example, the first entry would remain "ACQUISITION of CONCEPTS of DEVELOPMENT of ANIMALS by CHILDREN" regardless of whether the query were "ACQUISITION AND CHILD*", "CHILDREN", "ANIMALS AND DEVELOPMENT", "CONCEPT* OR IDEA*", or any other Boolean or nonBoolean formulation satisfied by the document in question.
All the terms are also included in each entry. The result is that each entry is a more or less complete description of the document.
The format's preferences for postposing qualifiers and for distinguishing linktypes through connectives are fully expressed here. In longer descriptions, notably in "EFFECTS of ACQUISITION of CHILDREN'S STORIES by CHILDREN'S LIBRARIES in CANADA relating to PUBLISHING OF CHILDREN'S STORIES", the result may be somewhat difficult to follow; but, in shorter descriptions, the meaning is generally clear and quickly assimilated.
Suppose first that the user changes the definition of the "by" linktype so that it has a higher weight than the "of" linktype. The resulting display is:
ACQUISITION by CHILDREN & BABIES of LANGUAGE SKILLS
ACQUISITION by CHILDREN of CONCEPTS of DEVELOPMENT of ANIMALS
ACQUISITION by CHILDREN of CONCEPTS of PROBABILITIES & CHANCE
ACQUISITION by CHILDREN of LANGUAGE SKILLS
EFFECTS of ACQUISITION by CHILDREN'S LIBRARIES in CANADA of
CHILDREN'S STORIES relating to PUBLISHING OF
CHILDREN'S STORIES
EFFECTS of BILINGUALISM of CHILDREN relating to ACQUISITION
of LANGUAGE SKILLS
READINGS relating to SELECTION by LIBRARIES in UNITED STATES
of CHILDREN'S BOOKS relating to ACQUISITION
Here, more emphasis is being placed on the agent as a way of
distinguishing one process from another and less on the patient;
e.g., more on "children and babies" and less on "language skills"
in the first entry. In a longer display, such a change might have
important implications for the grouping of the entries; here, it
only serves to separate slightly the two entries relating to
"language skills".
Second, suppose that the user raises the subheading threshold, above the weight of the "of" linktype. The resulting display is:
ACQUISITION by CHILDREN & BABIES of LANGUAGE SKILLS
ACQUISITION by CHILDREN of CONCEPTS of DEVELOPMENT of ANIMALS
of PROBABILITIES & CHANCE
of LANGUAGE SKILLS
EFFECTS of ACQUISITION by CHILDREN'S LIBRARIES in CANADA of
CHILDREN'S STORIES relating to PUBLISHING OF
CHILDREN'S STORIES
of BILINGUALISM of CHILDREN relating to ACQUISITION of
LANGUAGE SKILLS
READINGS relating to SELECTION by LIBRARIES in UNITED STATES
of CHILDREN'S BOOKS relating to ACQUISITION
The display is more compact and, in that respect, easier to scan;
on the other hand, the person scanning it may have difficulty in
attaching, to the line "of PROBABILITIES & CHANCE", the
appropriate heading-plus-subheading "ACQUISITIION by CHILDREN of
CONCEPTS".
Third, suppose that the user raises the cutoff threshold somewhat, above the weights of the "in" linktype and of the catchall "relating to" linktype. The resulting display is:
ACQUISITION by CHILDREN & BABIES of LANGUAGE SKILLS
ACQUISITION by CHILDREN of CONCEPTS of DEVELOPMENT of ANIMALS
of PROBABILITIES & CHANCE
of LANGUAGE SKILLS
EFFECTS of ACQUISITION by CHILDREN'S LIBRARIES of CHILDREN'S
STORIES . PUBLISHING OF CHILDREN'S STORIES . CANADA
of BILINGUALISM of CHILDREN. ACQUISITION by CHILDREN of
LANGUAGE SKILLS
READINGS . SELECTION by LIBRARIES of CHILDREN'S BOOKS .
ACQUISITION by LIBRARIES of CHILDREN'S BOOKS .
UNITED STATES
Because the query-independent format is required to include all terms in every entry, raising the cutoff threshold does not shorten the entries, as it does in the original query-dependent format. Instead, the typical effect is to chop up an entry into several segments separated by periods. When these segments are added together, the overall entry may in fact be longer than it would be with the cutoff threshold lower. Note the repetition of "by LIBRARIES of CHILDREN'S BOOKS" in the final entry above.
For longer entries, the chopping up may improve readability, especially when a relatively troublesome linktype is brought below the threshold and need no longer be expressed. Thus, the entry "EFFECTS of ACQUISITION by CHILDREN'S LIBRARIES of CHILDREN'S STORIES . PUBLISHING of CHILDREN'S STORIES . CANADA" is probably clearer and more readable than "EFFECTS of ACQUISITION by CHILDREN'S LIBRARIES in CANADA of CHILDREN'S STORIES relating to PUBLISHING of CHILDREN'S STORIES" even though the latter distinguishes more different links between concepts.
The requirement of retaining all terms even when the cutoff threshold is raised derives basically from the difficulty of deciding, without reference to any query, which terms should be retained and which might safely be dropped. A simplistic rule such as retaining only the initial terms could yield generally unhelpful entries such as an unqualified "READINGS".
As the cutoff threshold is raised further, the ultimate result is for no links to be expressed explicitly in any of the entries. Each entry thus becomes simply a list of terms; e.g.,
ACQUISITION . CONCEPTS . CHILDREN . CHANCE . PROBABILITIES
. DEVELOPMENT . ANIMALS . CHILDREN
. LANGUAGE SKILLS . CHILDREN
. BABIES
EFFECTS . ACQUISITION . CHILDREN'S STORIES . CHILDREN'S
LIBRARIES . PUBLISHING . CANADA . CHILDREN'S
STORIES
. BILINGUALISM . ACQUISITION . LANGUAGE SKILLS . CHILDREN
READINGS . SELECTION . ACQUISITION . CHILDREN'S BOOKS .
LIBRARIES . UNITED STATES
In general, such entries are likely to be relatively less useful
to searchers than entries in which at least some types of concept
links are indicated.
The requirement for query independence restricts the format quite severely. Notably, it leads to the inclusion of every term in all variations of entry.
The strong preference for the postposing of qualifying terms may not be the best choice. A mixed system in which certain essentially qualifying terms precede the terms that they qualify, as in many existing systems of subject headings, including the"feature" headings of PRECIS (Austin and Dykstra 1984) might be more appropriate.
Another area for future investigation is the development of more sophisticated query-dependent entry formats. In this area, formats specially suited to the display of responses to Boolean queries seem especially important.
| Notes, 2010 |
|---|
| Program files for NETPAD are no longer available. |
Last updated June 15, 2010, by
Tim Craven
Home