LIS 677 - Thesaurus project examples

Acknowledgement of sources

Obviously, not acknowledging sources at all is the worst thing that you can do here. You should also pay attention to giving sufficiently full information. For example, the date is missing from the following citation:
Thesaurus of ERIC Descriptors (13th ed) ed. by James E. Houston, Oryx P.

Facet question

The following facet question demonstrates several problems:
"How does UWO plan to accommodate increases in student applications?"

First, its scope seems a little narrow: although there might be a number of relevant articles at a particular time, such as during planning for the "double cohort" in 2003, there may be many issues of Western News in which no relevant articles appear.

It is also not very clear: more guidance than a single short sentence is usually desirable for indexers.

Finally, it lacks unity. It does not specify a single concept category: what kind of word or phrase is expected as an answer? Nor does it specify a role in relation to the item indexed. in fact, it does not even refer to the item indexed. It seems more like a reference question that the student thinks might be answered by certain articles.

The following question is somewhat better, but still presents difficulties:

"What is the process associated with the article?"
It does identify a concept category ("process") and does refer to the item indexed ("the article"). On the other hand, it is still very short, and the expression "associated with" is vague and calls for further explanation.

The following brief facet question is well formed:

"What University of Western Ontario faculties or departments are named in the article?"
It does result in a very shallow hierarchy, however, with only two levels. It could be improved by including more specific levels, such as programs and chairs, in addition to faculties and departments.


Remember that what is called for is "a discussion of up to 500 words on problems encountered in constructing the thesaurus". Discussion should cover a substantial number of relevant points.

General observations, such as the following, are not needed:

A thesaurus is a list of terms and concepts comprising the specialized vocabulary of [a] particular field, showing synonymous, hierarchical, and other relationships of meaning...

Your discussion should also not repeat information that belongs in your acknowledgement of sources; for example,

I collected the terms for my thesaurus from two UWO web pages: "Western Facts 1999" and "Western Academic Calendar 2000".

Here is an example of the kind of material that should be included in the discussion:

Another problem was the desire to include inverted terms. In the beginning, several of the terms appeared in inverted format. I found that rather than attempt to reform the terms, often the best solution was to find an entirely different term to describe the concept. For example, "Band Concerts - Jazz" was a term that appeared in an early version of the thesaurus. Instead of inverting the words and constructing a 3-word term, I changed the term to "Jazz performances". This term is better...

Hierarchical display

The terms need to fit the facet question. If the facet question is well worded, all the terms should belong to the same category. In the following example, the student started with quite a good facet question:
"Research into which branch of knowledge, in the Natural or Physical Sciences, theoretical or applied, is represented in the article?"
But problems are evident in the hierarchical display:
            NT HEART
            NT CANCER
                   NT BREAST
The higher terms in the hierarchy fit the category specified by the question; but, mixed in among the lower terms, we find diseases ("DISEASES", "CANCER", "OSTEOPOROSIS"), parts of the body ("HEART", "BREAST") and an adjective ("CLINICAL"). Mixing categories in this way also frequently leads to illegitimate hierarchical links; for example, the heart is not a part or kind of cardiology and "HEART" should therefore not be entered as a narrower term under "CARDIOLOGY".

Coverage should be reasonably balanced. For example, in assessing the following hierarchy, you need to ask yourself whether basketball, hockey, and tae kwon do will turn up sufficiently often when compared to other intramural sports that they warrant their own terms while the other sports do not:

               INTRAMURAL HOCKEY
               TAE KWON DO CLUB
Incidentally, you may also notice here the problem that "TAE KWON DO CLUB" does not belong to the same category as the other terms and is therefore not appropriate, though "INTRAMURAL TAE KWON DO" might be.

Make terms equivalent where appropriate. For instance, in the following hierarchical display, the terms "CHAIR", "CHAIRMAN", and "CHAIRWOMAN" should all have been nonpreferred terms for "CHAIRPERSON":



Avoid using adjectives as terms. For example, in the following display, it would have been better to say "FEDERAL GOVERNMENT", "MUNICIPAL GOVERNMENT", "PROVINCIAL GOVERNMENT":



Even when the terms all fit a well-worded facet question, you should make sure that all the hierarchical relations are correct. Take the following example, where the facet question has to do with positions to which people can be appointed or elected:

Although it may be argued that an acting dean is a kind of dean, assistant deans and associate deans are not deans; so, the terms "ASSISTANT DEAN" and "ASSOCIATE DEAN" do not belong under "DEAN".

Alphabetical display

If you use a thesaurus construction package, consistency with the hierarchical display and internal consistency of the alphabetical display should not be problems.

The following is an example of a badly constructed scope note:

Not only does it contain two spelling errors ("GRAND" for "GRANT" and "AND" for "AN"), but it also gives a definition in the infinitive form, which is not suitable for noun phrases.

The following scope note is bad for a different reason:

     SN Use for both a process and an item.
It mixes the two categories "process" and "item". The term should belong to a single category, in this case "process", as identified in the original facet question.

Indexed articles

Make sure that you indicate your answer to the facet question in your own words. One way to do this is to give the answer in an indicative sentence, followed by the thesaurus terms; for example,
This article discusses the climatological research into long-term weather and climate changes, using ecological methodologies.


Another way is by providing a table that compares your words with the thesaurus terms; for example,

Actual Words Thesaurus Terms

UWO ombudsperson [ombudsperson]
UWO President [president]
Acting Dean of Engineering Science [dean]
Deans' ... [dean]

Do not synthesize terms unless you have included explicit instructions for doing so in your thesaurus. In the following example, the student has synthesized a term out of the three thesaurus terms "ATHLETIC EVENTS", "BANQUETS", and "AWARDS", even though no instructions for synthesis have been provided:

We also realize that this synthesis is quite inappropriate when we look at the hierarchical display and see that "AWARDS" and "BANQUETS" are identified as narrower terms to "ATHLETIC EVENTS" and that there is actually a single term "AWARDS BANQUETS" that fits the desired meaning:


Last updated June 7, 2006.
This page maintained by Prof. Tim Craven
E-mail (text/plain only):
Faculty of Information and Media Studies
University of Western Ontario,
London, Ontario
Canada, N6A 5B7