{ 192}

APPENDIX C
A CASE STUDY OF STRING INDEXING

The purpose of this appendix is to show at some length how a string indexing system is used to create an index display. The particular index displays chosen are the five-year cumulative indexes to The Canadian Journal of Information Science (Craven 1980b, 1980c).

A major reason for choosing the CJIS cumulative indexes is the fact that the string indexing system used to produce them was NEPHIS. NEPHIS is far from being an ideal string indexing system, but it does have a number of features useful for purposes of illustration. First, the input strings are coded, and indexer control over index string syntax can readily be shown. Second, index string generation is neither too simple nor too complex: on the one hand, the procedure is sufficiently sophisticated to demonstrate some features of more sophisticated index string generators; on the other hand, it is sufficiently unsophisticated that it can be followed by the reader without extensive instruction. Third, NEPHIS software is readily available, notably versions for Commodore and IBM-PC-compatible microcomputers; and readers may care to follow the examples, and try alternatives of their own, with the aid of their own machines. Finally, of course, NEPHIS was designed by the author, who is thus uniquely familiar with its operation.

Other reasons for choosing the CJIS indexes as the case to be studied include the fact that the author was the indexer and index producer; the fact that the subject matter of the indexed items is likely to be somewhat familiar to many readers; and the availability of the indexed items as a published set. { 193}

C.1 Preparation for indexing CJIS

No previous index to CJIS had been produced, and the index producer was not required to adhere to a set index display format. Since CJIS contains articles in both English and French, however, both English and French index displays were needed. The use of NEPHIS was also specifically requested.

The index producer found no existing controlled vocabulary suited to NEPHIS indexing of materials in the field of information science. Accordingly, vocabulary control was carried out parallel to the actual indexing. The vocabulary used was based partly on that of the indexed items and partly on the experience of the index producer.
     Rather than using NEPHIS' capacity for defining lead-only and alternate terms, it was decided to employ cross-references. Including lead-only and alternate terms in the input strings would have proved too clumsy in some cases.

Only subjects were to be indexed, with no terms indicating form such as "Book Review". Each of the two index displays, English and French, would supply access to all articles.

C.2 Indexing CJIS

What follows will consider, item by item, the indexing of the 17 articles in the last of the five CJIS volumes indexed, that for 1980.  The purpose is to demonstrate in some depth the kinds of decisions faced by the string indexer. Included will be a brief description of each article, the NEPHIS input string or strings assigned, the resulting index strings, and discussion of problems and alternatives.

Article 1
"Freedom of information: attitudes, rights, laws, and policy"

The title phrase "freedom of information" gives a fairly good idea of the subject of this article. An examination of the text, however, shows that freedom of information is discussed almost entirely as it relates to the Canadian federal government; topics considered include typical obstacles to citizens in gaining access to federal government information and future possibilities for federal freedom-of-information legislation.

The indexer estimated that "freedom of information" was a fuzzy term and decided to prefer something like "access to government information" { 194} as more precise. Since many searchers were likely to think of the term "freedom of information", however, a cross-reference would be provided.

It had been decided to refer to all specific governments with the expression "government of" plus the geographical name. Thus, the Canadian federal government would have to be referred to as "government of Canada". The author speaks virtually entirely of access by Canadian citizens, and not by corporations, other governments, or noncitizens. Thus, the idea of "citizens of Canada" seemed necessary. To avoid "stuttering", however, "Canada" should be an access term in only one of the two expressions.

The final input string was

Access? to <Information? of <Government? of <Canada>>>? by <Citizens of Canada>
with the index strings:
  1. Access to Information of Government of Canada by Citizens of Canada
  2. Canada. Government. Information. Access by Citizens of Canada
  3. Citizens of Canada. Access to Information of Government of Canada
  4. Government of Canada. Information. Access by Citizens of Canada
  5. Information of Government of Canada. Access by Citizens of Canada
In addition, the indexer entered the cross-reference specification
Freedom of Information * SEE ALSO Access to Information
A cross-reference under "Information. Freedom of -" was not considered worthwhile: searchers specifically interested in "freedom of information" were thought likely to look up "Freedom"; searchers interested in material on "information" were expected to find index string 5 directly.

The duplicate term "Canada" illustrates a drawback of NEPHIS here. The term is desired after the lead term "Citizens" in index string 3 to improve eliminability, but its repetition makes all the index strings less succinct. In retrospect, the NEPHIS indexer might have been better advised to use simply "Citizens", or even "Public", instead of "Citizens of Canada", with the resulting loss of eliminability being of little real consequence.

Article 2
"Computerized conferencing: an eye-opening experience with EIES"

This article discusses both computerized conferencing in general and the author's personal experiences with the EIES system in particular. Emphasis is placed on the psychological aspects of computerized conferencing.

It had been decided to deal with items which discuss more or less equally { 195} both a general topic and a specific topic by entering two input strings. Thus, one input string was required for "computerized conferencing" and another for "personal experiences with EIES". It was difficult to imagine searchers looking up the full name of EIES, "Electronic Information Exchange System". Thus, the acronym would be used, and without a cross-reference. On the grounds that a computerized conference is not really a conference, the indexer decided against making "Conference" or "Conferencing" an access term and treated "Computerized Conferencing" as a single term. "Computerized" was preferred to "Computer" as less ambiguous. The expression "Substitution of Telecommunications for Meetings" had been established in earlier indexing, and would form the basis for cross-references to the new term.

The first input string was

Computerized Conferencing
with the index string
  1. Computerized Conferencing
The second input string was
@Personal Experiences? with <EIES>
with the index string
  1. EIES. Personal Experiences
The cross-reference specification entered was
Substitution? of <Telecommunications>? for <Meetings> * SEE ALSO Computerized Conferencing

In retrospect, access via "Electronic" might also have been considered. One way of providing such access would, of course, have been through the full name of EIES. More generally useful, however, would have been a cross-reference from "Electronic" to "Computerized". The decision not to provide access via "Conference" or "Conferencing" might also have been reconsidered.

Article 3
"Computer conferencing and the development of an electronic journal"

This article deals with a specific use of the EIES system; namely, the construction of an online equivalent of a printed scholarly journal. Although experimental, this electronic journal has a name, Mental Workload. { 196}

It had been decided that the descriptions of an indexed item should fit the actual topic of the item and not some broader topic. Here, the actual topic was a specific electronic journal. To distinguish this electronic journal from the concept of mental workload, however, the indexer had to include a parenthetical qualifier such as "(Electronic Journal)". In addition, cross-references would be needed from "Computerized Conferencing", "Electronic Journals", and "EIES".

The input string was

Mental Workload (Electronic Journal)
with the index string
  1. Mental Workload (Electronic Journal)
The three cross-reference specifications added were:
Computerized Conferencing * SEE ALSO Mental Workload (Electronic Journal)

EIES * SEE ALSO Mental Workload (Electronic Journal)

Electronic Journals * SEE ALSO Mental Workload (Electronic Journal)

For this particular item, the employment of cross-references seems somewhat laborious and possibly wasteful of the searcher's time. In the interests of consistency, however, the general rule of cross-referencing was adhered to.

Article 4
"Systèmes d'information visuelle utilisant les fibres optiques" ("Visual information systems using optical fibres")

This article considers successsively, and in some detail, three related topics: optical fibres; computerized visual information systems; and the Hi-OVIS (Highly Interactive Optical Visual Information System) project in Japan. The emphasis on optical fibres is not obvious from the abstract, but is clear from section headings and illustrations, as well as from the text itself.

Given the effective absence of a topic for the article as a whole, the indexer decided to index the three topics separately. "Optical fibres" and "Hi-OVIS" could be treated as single terms. "Computerized visual information systems", however, would benefit from further analysis. The term "Computerized Systems" had already been established in earlier indexing, as had the expression "Computerized Systems for Control of". Indeed, the expression "Computerized Systems for Control of Information on Law" { 197} had already been used to index an earlier article. Thus, the indexer favored "Computerized Systems for Control of Visual Information". "Visual Information" could not readily be broken down into nouns and prepositions, and the adjective-noun form was therefore retained.

The first input string was

Optical Fibres
with the index string:
  1. Optical Fibres
The second input string was
Computerized Systems? for <Control? of <Visual <Information>>>
with the index strings:
  1. Computerized Systems for Control of Visual Information
  2. Control of Visual Information. Computerized Systems
  3. Information. Visual -. Control. Computerized Systems
  4. Visual Information. Control. Computerized Systems
Finally, the third input string was
Hi-OVIS
with the index string:
  1. Hi-OVIS

Because the index entries from the second input string would lead searchers directly to the indexed item via several broader and related terms, no cross-references to "Hi-OVIS" were specified.

Article 5
"Researching office information communication systems"

This paper discusses research at B-N Software Research, Inc., on the the experimental use of a computerized system for text editing, electronic mail, information retrieval, and other functions. What the system's name is is not clear: the acronym "OICS" appearing in the text may simply be a general abbreviation for "office information communication system".

The indexer decided to refer to the computerized office information system by means of a descriptive expression based on the "Computerized Systems { 198} for Control of" formula already established. Since only one system is discussed, however, the singular would be substituted for the plural.

The input string was

@Experimental Use? of <Computerized System? for <Control? of <Information? in <Offices>>>>? at <B-N Software Research>
with the index strings:
  1. B-N Software Research. Experimental Use of Computerized System for Control of Information in Offices
  2. Computerized System for Control of Information in Offices. Experimental Use at B-N Software Research
  3. Control of Information in Offices. Computerized System. Experimental Use at B-N Software Research
  4. Information in Offices. Control. Computerized System. Experimental Use at B-N Software Research
  5. Offices. Information. Control. Computerized System. Experimental Use at B-N Software Reserch

One problem was created by the use of the singular "System" rather than the corresponding plural. Index string 2 would file slightly apart from index strings beginning "Computerized Systems for Control of", with a resultant loss of collocation.

The choice of "Information in Offices" may not have been the best for future indexing. Although the information discussed in this article is in fact used in an office, it is also possible to speak of office-type information used by people in their homes or other locations. "Office Information" might prove in future to be a more generally useful expression.

The input string links "B-N Software Research" with "Experimental Use". "B-N Software Research" might easily instead have been linked with "Computerized System", since the computerized system, like the experimental use, was located at B-N Software Research.

Article 6
"A new perspective on quantitative information"

This article sets out a structure for thinking about quantitative information; for this purpose, it introduces dialogs between a human searcher and a hypothetical machine for manipulating this kind of information.

The indexer decided that the overall topic was best described simply as { 199} "theory of quantitative information". The phrase "quantitative information" did not analyze readily into the recommended noun-preposition formula favored by NEPHIS: "information on quantities" would probably strike searchers too oddly.

The input string was

@Theory? of <Quantitative <Information>>
with the index strings:
  1. Information. Quantitative -. Theory
  2. Quantitative Information. Theory

Article 7
"The peculiar and complex economic properties of information"

This brief article gives an overview of some of the attempts to apply economic theory to information. The indexer, following the author's wording fairly closely, decided on the subject description "economic properties of information".

The input string was

Economic Properties? of <Information>
with the index strings
  1. Economic Properties of Information
  2. Information. Economic Properties

If other indexed items had also considered economics extensively, the indexer might have been better advised to standardize on a somewhat more generally useful term than "Economic Properties"; for example, "Economic Aspects". The term chosen, however, was probably adequate under the circumstances.

The translation of the term "Economic Properties" into French did cause a problem. The term "propriétés économiques" would file nowhere near "économique" ("economics"). A cross-reference specification was therefore added:

économique * VOIR AUSSI propriétés économiques
Having regard to the French version, the indexer might better have analyzed the topic of the article as something like "applications of economics to the study of information". { 200}

Article 8
"Indexing for associative processing"

This article considers what kinds of indexing techniques would be suitable for automatic information retrieval systems based on associative processing hardware. The main characteristic of associative processing is the addressing of computer memory locations by their content rather than by coordinates.

The indexer decided simply to adopt the description provided by the title, "indexing for associative processing", and to treat "associative processing" as a single term.

The input string was

Indexing? for <Associative Processing>
with the index strings:
  1. Associative Processing. Indexing
  2. Indexing for Associative Processing

In retrospect, the results here seem capable of improvement. The description assigned seems somewhat too brief, and index string 1 appears rather cryptic. Inclusion of a term representing the idea of "automatic information retrieval systems" would probably have added usefully to detail and clarity. Access via the term "content-addressable" or the like might also have proved worthwhile.

Article 9
"Un modèle sémiotique de la traduction" ("A semiotic model of translation")

This article presents some ideas on the translation process, relating it especially to the process of reading.
     As for article 8, the indexer chose to accept the description of the title; here, "semiotic model of translation". A general decision had been made not to consider "model" as a useful access term because of its vagueness; thus, "Semiotic Model" was treated as a single term.

The input string assigned was

Semiotic Model? of <Translation>
with the index strings
  1. Semiotic Model of Translation
  2. Translation. Semiotic Model

Here, as in the case of article 7, the French-language placing of adjectives after their modified nouns created a problem. The term "modèle sémiotique" { 201} ("semiotic model") would file far from the term "sémiotique" ("semiotics"), and an added cross-reference specification was accordingly required:

sémiotique * VOIR AUSSI modèle sémiotique
With a view to the French translation, as well as to the general NEPHIS pattern of nouns and prepositions, the indexer might have been better advised to use the English input string
@Model? of <Translation>? from <Semiotics>
with the index strings:
  1. Semiotics. Model of Translation
  2. Translation. Model from Semiotics
This approach, however, would present the problem of translating the distinction between "from" and "of" into French; for this purpose, some degree of periphrasis would be required.

"Reading" should perhaps also have been included as an access term, given the emphasis on this idea in the article. The overall topic might possibly have been described as a "model of translation derived from a model of reading" or "derivation of a model of translation from a model of reading".

Article 10
"Lodes of French-Canadian gold in US libraries"

The title of this article is obviously partly fanciful. Examination of the abstract shows that "French-Canadian gold" refers to "items in French which are primary sources for our knowledge of Canada before the middle of the 18th century". The gold in the article was sought, not directly in the US libraries themselves, but indirectly in the National Union Catalog: Pre-1956 Imprints. The search in question was an initial stage of the author's WHSTC (Western Hemisphere Short Title Catalog) project.

The expressions "Bibliographic Control of" and "Role of ... in" and the terms "French Language", "Books", and "Canada" had already been established. The use of the term "Control" especially would help to group this item with a number of other items. Thus, the indexer decided to describe the topic rather fully as "the role of the WHSTC project in the bibliographic control of books in the French language relating to the history of Canada cited in the National Union Catalog: Pre-1956 Imprints".

The input string was { 202}

@Role? of <WHSTC Project>? in <Bibliographic <Control>? of <Books? in <French Language>? relating to <History? of <Canada>> cited? in <National Union Catalog Pre-1956 Imprints>>>
with the index strings:
  1. Bibliographic Control of Books in French Language relating to History of Canada cited in National Union Catalog Pre-1956 Imprints. Role of WHSTC Project
  2. Books in French Language relating to History of Canada cited in National Union Catalog Pre-1956 Imprints. Bibliographic Control. Role of WHSTC Project
  3. Canada. History. Books in French Language cited in National Union Catalog Pre-1956 Imprints. Bibliographic Control. Role of WHSTC Project
  4. Control. Bibliographic - of Books in French Language relating to History of Canada cited in National Union Catalog Pre-1956 Imprints. Role of WHSTC Project
  5. French Language. Books relating to History of Canada cited in National Union Catalog Pre-1956 Imprints. Bibliographic Control. Role of WHSTC Project
  6. History of Canada. Books in French Language cited in National Union Catalog Pre-1956 Imprints. Bibliographic Control. Role of WHSTC Project
  7. National Union Catalog Pre-1956 Imprints. Books in French Language relating to History of Canada cited. Bibliographic Control. Role of WHSTC Project
  8. WHSTC Project. Role in Bibliographic Control of Books in French Language relating to History of Canada cited in National Union Catalog Pre-1956 Imprints

The treatment of the expression "cited in" is interesting. The indexer coded only the "in" part as a forward-reading connective, allowing "cited" to appear in all the index strings. In this way, the relationship of the lead term "National Union Catalog Pre-1956 Imprints" to the rest of index string 7 above was made clear. As a general rule, participles, like "cited", are avoided in NEPHIS indexing; but they are occasionally employed, as here, to improve the quality of particular index strings.

The input string assigned to article 10 was longer than the input string assigned to any other article, and no other article had more access terms. No guidelines had been worked out in advance explicitly to restrict the number { 203} of words used to describe an indexed item. The result was that some indexed items, like article 10, had long descriptions with up to seven access terms, while others had much shorter descriptions with only one or two access terms. Because of the large number of access terms assigned to article 10, the indexer decided that no cross-references were needed.

Article 11
"Aspects of pre-literate culture shared by on-line searching and videotex"

The title of this article gives a fairly good idea of its subject matter. The approach of the article is not, however, a strict comparison of pre-literate cultures and online searching. Instead, it mainly discusses how human thinking was affected by literacy and may be affected by online searching.  Right- and left-brain specialization is given some attention. Relatively little is said about videotex.  The online systems referred to appear to be largely textual. In addition to searching, other uses of online systems, such as electronic mail, are also mentioned.

Preliterate cultures are not heavily dealt with, and the word "preliterate" begins with a prefix and did not seem an especially important access term. The indexer therefore decided to omit this idea, analyzing the topic as "the effects of literacy and of the use of machine-readable textual databases on human information processing". For "human information processing", the more NEPHIS-like expression "Processing of Information by Human Beings" had already been established for an earlier indexed item and was adopted here also.

The input string was

@Effects? of <Literacy? & <@Use? of <Machine-readable <Textual <Databases>>>? & >>? on <@Processing? of <Information>? by <Human Beings>>
with the index strings:
  1. Databases. Textual -. Machine-readable -. Use & Literacy. Effects on Processing of Information by Human Beings
  2. Human Beings. Processing of Information. Effects of Literacy & Use of Machine-readable Textual Databases
  3. Information. Processing by Human Beings. Effects of Literacy & Use of Machine-readable Textual Databases
  4. Literacy & Use of Machine-readable Textual Databases. Effects on Processing of Information by Human Beings { 204}
  5. Machine-readable Textual Databases. Use & Literacy. Effects on Processing of Information by Human Beings
  6. Textual Databases. Machine-readable -. Use & Literacy. Effects on Processing of Information by Human Beings

The results illustrate a difficulty with NEPHIS' usual treatment of parallel or coordinate parts of descriptions: as one or more of the parallel parts becomes quite complex, clarity tends to suffer. The decline in clarity is especially noticeable in index strings 1, 5, and 6 above, where the expression "Use & Literacy" may require some effort at comprehension on the searcher's part.

In retrospect, especially in view of the rather speculative nature of the article, the input string might have been made somewhat shorter. The term "Textual", at least, could readily be sacrificed; and "Machine-readable" would not be needed if all databases were assumed to be machine-readable.

Article 12
"Anomalous states of knowledge as a basis for information retrieval"

After making some general points about information retrieval systems, this article presents the idea of "anomalous states of knowledge" (ASK); i.e., states of knowledge which cause people to feel needs for information, without necessarily being able to express these needs in a query. The ASK concept is discussed as having implications for information retrieval theory and design.

The indexer decided that the overall topic was "the applicability of the concept of anomalous states of knowledge to information retrieval". The acronym "ASK" had already been used in the unrelated term "ASK Service" and was not widely known in the "anomalous state of knowledge" meaning; moreover, both the acronym and the full form "Anomalous States of Knowledge" would file close together in the final index. The indexer therefore chose to prefer the full form and not to bother with a cross-reference.  For "information retrieval", the expression "retrieval of data" was substituted, as more accurate.

The input string was

@Applicability? of <@Concept? of <Anomalous States of Knowledge>>? to <Retrieval? of <Data>>
with the index strings:
  1. Anomalous States of Knowledge. Concept. Applicability to Retrieval of Data { 205}
  2. Data. Retrieval. Applicability of Concept of Anomalous States of Knowledge
  3. Retrieval of Data. Applicability of Concept of Anomalous States of Knowledge

Searchers looking under "Information" had already been taken care of with the cross-reference specification

Information * SEE ALSO Data
Nevertheless, "information retrieval" is such a common phrase in the field of information science that it might have been better to prefer "Information Retrieval" to "Retrieval of Data", or at least to provide a cross-reference such as
Information Retrieval * SEE Retrieval

Article 13
"Automation of small public libraries in Canada"

The title is quite indicative of the content of this article. "Automation", as it usually does nowadays, here means the use of computerized systems. "Small" is specifically defined by the author as "not members of the Council of Administrators of Large Urban Public Libraries".

The term "Computerized Systems" had already been assigned to a number of articles and "Use of Computerized Systems" had already been established in indexing an earlier article; the indexer therefore preferred this expression to "automation" and to other alternatives such as "computerization". The rather vague word "small" was retained, rather than a more exact description derived from the author's definition, which would be unnecessarily precise for indexing purposes. Following a pattern laid down in earlier work, the indexer linked "Small Public Libraries" to "Use" and not to "Computerized Systems"; this decision was also justified by the fact that the computerized systems discussed were not always in the libraries.

The input string was

@Use? of <Computerized Systems>? in <@Small <Public <Libraries>? in <Canada>>>
with the index strings:
  1. Canada. Public Libraries. Small -. Use of Computerized Systems
  2. Computerized Systems. Use in Small Public Libraries in Canada { 206}
  3. Libraries. Public - in Canada. Small -. Use of Computerized Systems
  4. Public Libraries in Canada. Small -. Use of Computerized Systems
A cross-reference specification already established would assist searchers interested in the topic of "automation":
Automated * SEE ALSO Computerized

Article 14
"Automating the management information systems of libraries"

As in article 14, here too "automation" means "computerization". More precisely, the article deals with various possibilities for computerized management information systems in libraries.

Because the term "MIS" had already been established in previous indexing, the indexer chose to describe the topic as "computerized MIS in libraries" rather than as "computerized systems for decision-making in management of libraries" or the like.

The input string was

Computerized <MIS>? in <Libraries>
with the index strings
  1. Computerized MIS in Libraries
  2. Libraries. Computerized MIS
  3. MIS. Computerized - in Libraries

An already-established cross-reference specification would provide indirect access to the article via "Management", "Decision-making", and "Information Systems":

Information Systems? for <Decision-making? in <Management>> * SEE MIS

Article 15
"La science de l'information à l'école de bibliothéconomie" ("Information science in a library school context")

This article begins with a historical review of various concepts of information science; it then goes on to consider the relationships between information science and library science; finally, it briefly discusses recent curriculum changes at the University of Montreal library school. { 207}

The terms "Information Science" and "Concept" had already been established in earlier indexing, as had "Library Science". Since the whole article deals with information science, but only a part with library science, the indexer decided to describe the topic simply as "Concepts of Information Science".

The input string was

@Concepts? of <Information Science>
with the index string:
  1. Information Science. Concepts

In retrospect, the provision of some kind of access via "Library Science" seems possibly desirable. Even if the article were not described as "Information Science & Library Science" or the like, a cross-reference from "Library Science" to "Information Science" might be useful.

Article 16
"Explorations into informatic geometry: computer generation of partially hierarchical classifications"

This article describes, and reports on the experimental use of, a method for clustering terms. In this method, factor analysis is employed to define coordinates for each term in two or more dimensions. In the experimental use described, only two-dimensional coordinates were applied, and terms were derived from both a simulated and a real collection of indexed items.

The indexer analyzed the overall topic as "use of factor analysis in definition of axes for clustering of terms". The term "axes" seemed more indicative than "coordinates". The collection of real indexed items to which the method was applied was ignored, because it was not important to the method and because a simulated collection was also used. The expression "Clustering of Terms" had already been established in earlier indexing.

The input string was

@Use? of <Factor Analysis>? in <@Definition? of <Axes? for <Clustering? of <Terms>>>>
with the index strings:
  1. Axes for Clustering of Terms. Definition. Use of Factor Analysis
  2. Clustering of Terms. Axes. Definition. Use of Factor Analysis
  3. Factor Analysis. Use in Definition of Axes for Clustering of Terms
  4. Terms. Clustering. Axes. Definition. Use of Factor Analysis
{ 208}
An already established cross-reference specification provided indirect access via "Classification":
Classification * SEE ALSO Clustering
No access was provided, however, via the terms "Geometry" or "Coordinates".

Article 17
"Information space"

This article suggests the idea of a special "information space", with different metrical properties from physical space, as a tool for solving quantitative problems in information science. A large portion of the article is devoted to relating the law of perspective to the Bradford Law of distribution.

The indexer chose to describe the overall topic as "the applicability of the concept of information space to the quantitative aspects of information science". The expressions "Applicability of ... to" and "Concept of" and the term "Information Science" had already been established in earlier indexing.

The input string was

@Applicability? of <@Concept? of <Information Space>>? to <Quantitative Aspects? of <Information Science>>
with the index strings:
  1. Information Science. Quantitative Aspects. Applicability of Concept of Information Space
  2. Information Space. Concept. Applicability to Quantitative Aspects of Information Science
  3. Quantitative Aspects of Information Science. Applicability of Concept of Information Space

In the final index, as it turned out, index strings 1 and 2 appeared together, with no other index string intervening. This close proximity can nevertheless be justified. The two ideas of "Information Science" and "Information Space" are quite distinct; a searcher looking up one of these terms is quite unlikely to examine an index entry beginning with the other in sufficient detail to perceive that the index entry is in fact relevant.

In retrospect, providing access via the terms "Bradford Law" and "Perspective" or the like might have been desirable. For example, the indexer could have assigned a second input string such as

@Derivation? of <Bradford Distribution>? from <Mathematical Model? of <Perspective>>
{ 209} Indirect access might also have been furnished via the terms "Space" and "Geometry".

C.3 Sorting and formating the CJIS indexes

Sorting software available was quite crude and lacking in special options. Specifically, users could not define their own character rankings. The index producer therefore took special steps to ensure a reasonably good filing order in the final index.

The first difficulty to be circumvented was ASCII ranking of the space before the period, which would conflict with the general-before-special ordering principle. The index producer accordingly instructed the computer to modify the index strings temporarily. Before sorting, a space was inserted before each period in the input strings; then, after sorting, these spaces were removed.

The other pitfall to be avoided was a separate ranking of upper- and lower-case letters. In English-language input strings, this had already been limited by imposing initial capitals on all nouns and on all adjectives preceding the nouns that they modified. French-language input strings, by contrast, did not follow this convention, because of the conventions for capitalization familiar to French-speaking readers. For the French-language index, a special routine automatically capitalized the first word following a period in an index string.

The separate ranking of upper and lower case was circumvented for the most part by a modification introduced in response to a request by the journal editor. The editor wanted a clear contrast between headings and subheadings, a contrast which is not provided by NEPHIS in its basic form. To achieve this contrast, the index producer wrote a routine to modify the index strings so that every lower-case letter before the first period (or asterisk) was changed to its upper-case equivalent. The results for both sorting and formating can be seen in the following extract:

INFORMATION IN OFFICES. Control. Computerized System. Experimental Use at B-N Software Reserch * 5 61- 71
INFORMATION NETWORK. World -. Trend. Implications * 1  69-77
INFORMATION OF GOVERNMENT OF CANADA. Access by Citizens of Canada * 5 1-9
INFORMATION ON LAW. Control. Computerized System. Use in Quebec * 3 36-43
INFORMATION ON SCIENCE & TECHNOLOGY * SEE STI
INFORMATION OVERLOAD. Concepts. Applicability to Study of Processing of Information by Human Beings * 1 59-64
{ 210}

Accents, omitted from the English index, posed a special problem in French. Fortunately, they could be stripped from the headings, because French readers do not expect accents on capital letters. The stripping was done by the same routine that converted lower case letters to upper case. In the subheadings, backspace codes were automatically introduced into the index file to allow the accents to be printed on the appropriate characters, rather than separately as they had been entered. The cedilla (as in "française") and the dieresis (as in "centroïdes") were added by hand to the camera-ready copy.

A feature which might have improved the index displays but which was not introduced is the displaying only once of a heading common to more than one index element. The repetition of the lead term "INFORMATION" is especially noticeable, for example.

C.4 Index to the 17 articles discussed

For those interested, here are the index strings, and relevant cross-references, for the 17 articles discussed above. Sorting and formating are as for the full five-year index. The locators, on the other hand, are not page references to the actual articles, but numbers referring to the discussions in this appendix.
ACCESS TO INFORMATION OF GOVERNMENT OF CANADA BY CITIZENS
  OF CANADA (1)
ANOMALOUS STATES OF KNOWLEDGE. Concept. Applicability
  to Retrieval of Data (12)
ASSOCIATIVE PROCESSING. Indexing (8)
AUTOMATED * SEE ALSO Computerized
AXES FOR CLUSTERING OF TERMS. Definition. Use of Factor
  Analysis (16)
B-N SOFTWARE RESEARCH. Experimental Use of Computerized
  System for Control of Information in Offices (5)
BIBLIOGRAPHIC CONTROL OF BOOKS IN FRENCH LANGUAGE RELATING
  TO HISTORY OF CANADA CITED IN NATIONAL UNION CATALOG
  PRE-1956 IMPRINTS. Role of WHSTC Project (10)
BOOKS IN FRENCH LANGUAGE RELATING TO HISTORY OF CANADA
  CITED IN NATIONAL UNION CATALOG PRE-1956 IMPRINTS.
  Bibliographic Control. Role of WHSTC Project (10)
CANADA. Government. Information. Access by Citizens
  of Canada (1)
CANADA. History. Books in French Language cited in National
  Union Catalog Pre-1956 Imprints. Bibliographic Control.
  Role of WHSTC Project (10)
CANADA. Public Libraries. Small -. Use of Computerized
  Systems (13) { 211}
CITIZENS OF CANADA. Access to Information of Government
  of Canada (1)
CLASSIFICATION * SEE ALSO Clustering
CLUSTERING OF TERMS. Axes. Definition. Use of Factor
  Analysis (16)
COMPUTERIZED CONFERENCING (2)
COMPUTERIZED CONFERENCING * SEE ALSO Mental Workload
  (Electronic Journal)
COMPUTERIZED MIS IN LIBRARIES (14)
COMPUTERIZED SYSTEM FOR CONTROL OF INFORMATION IN OFFICES.
  Experimental Use at B-N Software Research (5)
COMPUTERIZED SYSTEMS. Use in Small Public Libraries
  in Canada (13)
COMPUTERIZED SYSTEMS FOR CONTROL OF VISUAL INFORMATION
  (4)
CONTROL. Bibliographic - of Books in French Language
  relating to History of Canada cited in National
Union Catalog Pre-1956 Imprints. Role of WHSTC Project
  (10)
CONTROL OF INFORMATION IN OFFICES. Computerized System.
  Experimental Use at B-N Software Research (5)
CONTROL OF VISUAL INFORMATION. Computerized Systems
  (4)
DATA. Retrieval. Applicability of Concept of Anomalous
  States of Knowledge (12)
DATABASES. Textual -. Machine-readable -. Use & Literacy.
  Effects on Processing of Information by Human Beings
  (11)
ECONOMIC PROPERTIES OF INFORMATION (7)
EIES * SEE ALSO Mental Workload (Electronic Journal)
EIES. Personal Experiences (2)
ELECTRONIC JOURNALS * SEE ALSO Mental Workload (Electronic
  Journal)
FACTOR ANALYSIS. Use in Definition of Axes for Clustering
  of Terms (16)
FREEDOM OF INFORMATION * SEE ALSO Access to Information
FRENCH LANGUAGE. Books relating to History of Canada
  cited in National Union Catalog Pre-1956 Imprints.
  Bibliographic Control. Role of WHSTC Project (10)
GOVERNMENT OF CANADA. Information. Access by Citizens
  of Canada (1)
HI-OVIS (4)
HISTORY OF CANADA. Books in French Language cited in
  National Union Catalog Pre-1956 Imprints.
  Bibliographic Control. Role of WHSTC Project (10)
HUMAN BEINGS. Processing of Information. Effects of
  Literacy & Use of Machine-readable Textual Databases
(11)
INDEXING FOR ASSOCIATIVE PROCESSING (8)
INFORMATION * SEE ALSO Data
INFORMATION. Economic Properties (7)
INFORMATION. Processing by Human Beings. Effects of
  Literacy & Use of Machine-readable Textual Databases
  (11) { 212}
INFORMATION. Quantitative -. Theory (6)
INFORMATION. Visual -. Control. Computerized Systems
  (4)
INFORMATION IN OFFICES. Control. Computerized System.
  Experimental Use at B-N Software Research (5)
INFORMATION OF GOVERNMENT OF CANADA. Access by Citizens
  of Canada (1)
INFORMATION SCIENCE. Concepts (15)
INFORMATION SCIENCE. Quantitative Aspects. Applicability
  of Concept of Information Space (17)
INFORMATION SPACE. Concept. Applicability to Quantitative
  Aspects of Information Science (17)
LIBRARIES. Computerized MIS (14)
LIBRARIES. Public - in Canada. Small -. Use of Computerized
  Systems (13)
LITERACY & USE OF MACHINE-READABLE TEXTUAL DATABASES.
  Effects on Processing of Information by Human Beings
  (11)
MACHINE-READABLE TEXTUAL DATABASES. Use & Literacy.
  Effects on Processing of Information by Human Beings
  (11)
MEETINGS. Substitution of Telecommunications * SEE ALSO
  Computerized Conferencing
MENTAL WORKLOAD (ELECTRONIC JOURNAL) (4)
MIS. Computerized - in Libraries (14)
NATIONAL UNION CATALOG PRE-1956 IMPRINTS. Books in French
  Language relating to History of Canada cited. Bibliographic
  Control. Role of WHSTC Project (10)
OFFICES. Information. Control. Computerized System.
  Experimental Use at B-N Software Reserch (5)
OPTICAL FIBRES (4)
PUBLIC LIBRARIES IN CANADA. Small -. Use of Computerized
  Systems (13)
QUANTITATIVE ASPECTS OF INFORMATION SCIENCE. Applicability
  of Concept of Information Space (17)
QUANTITATIVE INFORMATION. Theory (6)
RETRIEVAL OF DATA. Applicability of Concept of Anomalous
  States of Knowledge (12)
SEMIOTIC MODEL OF TRANSLATION (9)
SUBSTITUTION OF TELECOMMUNICATIONS FOR MEETINGS * SEE
  ALSO Computerized Conferencing
TELECOMMUNICATIONS. Substitution for Meetings * SEE
  ALSO Computerized Conferencing
TERMS. Clustering. Axes. Definition. Use of Factor Analysis
  (16)
TEXTUAL DATABASES. Machine-readable -. Use & Literacy.
  Effects on Processing of Information by Human Beings
  (11)
TRANSLATION. Semiotic Model (9)
VISUAL INFORMATION. Control. Computerized Systems
  (4)
WHSTC PROJECT. Role in Bibliographic Control of Books
  in French Language relating to History of Canada
  cited in National Union Catalog Pre-1956 Imprints
  (10)

<-- Appendix B: References Contents Appendix D: A Brief Manual for Composition of NEPHIS Input Strings -->