LIS 523 - Some Notes on Using XRefHT32

When you use the "Suggest keywords" function in XRefHT32's Editing window, the results that you get back will depend on the TheW32 thesaurus that you have open and on TheW32's current stoplist and suffix list.

At one extreme, there may be no matches and you may thus get no suggestions added to the list of keywords. This could be because the text of the page is too short or because the coverage of the thesaurus is too restricted (for example, if you use the sample thesaurus craven.the on any subject matter outside indexing and information retrieval).

You can deal with this extreme by making the text of your pages more substantial and by adding important words and phrases from them to the thesaurus.

At the other extreme, you may get a number of matches that seem to make no sense and that you therefore have to eliminate. This is especially likely to happen if the TheW32 thesaurus is large and not particularly well suited to the subject matter of your pages (for example, the sample thesaurus tgm.the if your pages are dealing with a lot of abstract concepts).

Here, it may help you to know why the false matches are occurring. Sometimes, the reason is simple over-stemming, the excessive removal of suffixes in determining matches. Some examples using the Thesaurus for Graphic Materials:
Term suggested Word in text Suffixes
mobiles mobility -es, -ity
police policy -e, -y
contentment content -ment
parties part -ies
stats states, stated -ed, -es, -s
fairies fairly -ies, -ly
runes run -es

Sometimes a word or phrase in the text will exactly match a non-preferred term in the thesaurus, but with a different meaning; for example,
Term suggested Word in text
birth defects abnormalities
dead animals bodies
children playing play
conspiracy plot
feathers down
journalism editing
physical fitness exercise
farm produce produce

Finally, the reason is often a combination of over-stemming and matching of non-preferred terms; for example,
Term suggested Non-preferred term Word in text Suffixes
dwellings homes home -e, -es
contract laborers coolies cool -ies
film stills stills still -s
parades & processions processions processed -ed, -ions
coins specie special -al, -e
reminiscing recalling recall -ing
meadows fields field -s
stoves ranges range -e, -es
fairies brownies Browne -e, -ies
magic spells spelling -ing, -s

Possible solutions may include removing problematic preferred or non-preferred terms from the thesaurus, removing suffixes from TheW32's suffixes.txt file, or adding common matching words (such as down) to TheW32's stoplist.txt file.


Last updated November 29, 2004.
This page maintained by Prof. Tim Craven
E-mail (text/plain only):
Faculty of Information and Media Studies
University of Western Ontario,
London, Ontario
Canada, N6A 5B7