LIS 677 - Weekly assignments and preparation
Reports and prepostings are due by 9 am on Wednesdays.
Comments should be submitted by the end of the day on Thursdays.
For more details on postings,
see LIS 677 - Discussions online.
You should submit only 7 of the weekly reports.
The assignments from January 16-17 through March 5-6
will, in various degrees, prepare you
to do the final thesaurus construction project.
Other assignments, by contrast,
tend to complement the final thesaurus construction
by providing insight into other aspects of subject analysis.
Comment: briefly introduce yourself.
What is your background in relation to indexing, thesauri, abstracting,
online searching, or database management,
as a user, student, or practitioner?
Also, what is your answer to one of the following questions?
- What is an index?
- Is a table of contents a kind of index?
- How are indexes related to library catalogs?
Preparation: Find a
nonfiction book that is of interest to you
and that does not presently have an index.
Start trying to index it.
Record your efforts.
Report: Give the title of your book.
List the types of decision
you found you had to make in your attempt at indexing.
Give examples or submit your notes with annotations.
Suggest general principles
that could govern decision-making
both in this indexing task and in indexing in general.
- Remember that a completed index is not the objective
of this assignment.
- For types of decision,
consider the whole range of activities
that might be involved in book indexing.
Don't concentrate too much on a single type of decision,
such as which terms to use.
- Principles suggested should be general,
not specific guidelines such as where to place commas.
- Use of sources is optional.
You might find it best to try the indexing task first
and then look at sources later,
especially to provide ideas for principles.
Preposting: give the title of the book you started to index.
How did you find it?
Why do you think it did not have an index?
What was one of the decisions that you had to make?
What principle of indexing might help with that kind of decision?
comment on a point raised by one other student.
Study the definitions
in the file 677def.htm.
Report: Give clear illustrations
of two of the definitions in section A.
Include any explanation needed
to link features of your illustrations
to the definitions.
Do the same for three of the definitions in section B.
- Here, "example"
could be another word for "illustration".
- You can take illustrations from textbooks
or from actual indexing and retrieval tools,
or make them up yourself.
- Regardless of the origin of your illustrations,
they should be clear.
- They should also fit the definitions given in the list.
Note that some of the terms, such as "relevance",
may have other definitions in the literature.
Post an illustration of one of the terms.
This may take the form of a link to an image on the Web.
What parts of a journal article do you think would be most useful
in determining the article's subject?
Look at 2 of the articles listed
in the file 677eri.htm.
Without looking at how the documents have been indexed,
index them yourself
using a recent edition of the ERIC thesaurus.
Then compare the indexer's results
in the file
Identify the 2 documents.
indicate what you thought it was about,
what ERIC descriptors (major and minor)
you assigned to it,
which of these, if any,
turned out to be illegal
(not permitted by the ERIC thesaurus),
and what descriptors were assigned to it by the indexer.
For each of the 2 documents,
note good and bad points about your choices
and those of the indexer.
(About 500 words apart from the lists of descriptors.)
- Remember to do everything
that the assignment asks for;
include what you thought each document was about
and mark clearly which of your descriptors are major
and which minor.
- If you don't have enough to say
about how well you and the indexer chose
from the terms in the ERIC thesaurus,
you may discuss the strengths and weaknessess
of the ERIC thesaurus in providing suitable terms.
Comment on the way the ERIC indexer and you
treated one aspect of the subject of one of the documents.
Apart from Western News itself,
what sources of terms do you think would be useful
for the course project?
Select a text in ASCII format
(you may use a series of related texts
concatenated into a single file).
Use ExtPhr32 to extract frequently occurring words
Describe briefly the source text used.
List about 15 extracted words and phrases.
For each, comment on its usefulness
(1) as an index term for the source text as a whole,
and (2) as a term for inclusion in a thesaurus
to be used to index similar texts.
Would adjustments to the stoplist
or to the threshold that you employed
have improved the results?
- You can save the text of a Web page in plain ASCII format
from a typical browser
by selecting "File|Save (Page) As...",
setting "Save as type" to "Text File(s)",
and giving the file the extension .txt.
- Note that you are to discuss the usefulness
of the terms as index terms for indexing the whole text,
as in database indexing,
not parts of it, as in book indexing.
Describe briefly the source text you used.
For one of the extracted words or phrases,
comment on its usefulness
as an index term for the source text
and as a term for inclusion in a thesaurus.
Suggest one facet for the descriptions
in the file 677fac03.htm.
Give it a mnemonic name and a question about the document
and list words or phrases in the descriptions
that fit the facet.
Your suggestion may be a new facet
or a modification of someone else's suggestion.
Study the item descriptions to be facet-analyzed
in the file 677fac.htm
and attempt to facet-analyze them.
Suggest five possible facets
by indicating for each a mnemonic name
and the general question about the item indexed
that the facet answers.
Indicate which of the words/phrases in each description
belong to each of your five facets
(this may be done by marking up a copy of the file).
- Avoid vague or general facet names
such as "aspect" or "subject".
- Be careful how you word your facet question.
It will form the basis
for evaluating the main part of the assignment,
which is identifying words and phrases
that fit the facet.
a facet question should indicate a category,
such as "person" or "action",
and a role to be played relative to the indexed item,
usually in its subject.
- Identify specific words and phrases
within each description;
don't just identify which descriptions contain answers
to a facet question.
Comment on a particular difficulty that might be experienced
in facet analyzing the specific set of descriptions.
Note one problem with the tentative facet analysis
in the file 677facan.htm
when it is posted.
Study the sets of equivalent terms
in the file 677equ.htm.
Think about what other terms might be added to the sets,
and about which term should be the preferred term
for each set and why.
For one of the equivalent-term sets of your choice,
indicate one additional equivalent term
and your preferred term.
Discuss in some detail (about 500 words)
the pros and cons of your preferred-term choice.
- Bear in mind the various principles
for choosing preferred terms:
usage, breadth, disambiguation, collocation,
conciseness, plural for countable objects,
- Take account
of other existing controlled vocabularies
that may be relevant.
- Your suggested additional term
should fit reasonably well
within the bounds of meaning
suggested by the terms already in the set.
Without looking at what other students have done,
list your preferred-term choices for all ten items.
Have your preferred-term choices for any of the ten items
changed after looking at other students' choices?
Why or why not?
Review the definitions
of the symbols BT, NT,
RT, and SN;
you may also want to look at sections 5-7
of the thesaurus construction tutorial.
Do the thesaurus-construction exercise
in the file 677sem.htm
using the form illustrated in the example;
use all four symbols.
In determining BT/NT relations,
use only genus/species and hierarchical whole/part.
Submit your hierarchical display
for the thesaurus-construction exercise,
the corresponding alphabetical display,
and a list of any sources used.
- Remember that all the terms
are supposed to be preferred terms.
"USE" and "UF" references
will not be required.
- Scope notes should be appropriate
to the meanings that you want for the terms.
They will be used
in evaluating the appropriateness
of your hierarchical structure.
- Consistency between the two displays is important.
It is much easier to achieve consistency
if you use thesaurus-construction software
for this assignment.
- An excessively deep hierarchy is a good sign
that one or more of your links needs rethinking.
Comment on a good or bad point of TheW32
or any other thesaurus construction software
with which you have experience.
Note a problem with the suggested answer to the exercise
in the file 677seman.htm
when it is posted,
or a way in which it would be incorrect
if one or more definitions were changed.
Select a Dialog database
that includes abstract and descriptor fields and a thesaurus.
Devise a suitable query for the database.
Perform two searches for relevant items,
with one search restricted to free-text fields
(title, abstract, full text if available)
and the other restricted to the descriptor field.
The search strategies
may or may not involve Boolean or proximity operators.
For each of the searches, capture 5 records
(or all the records in the retrieved set
if it contains 5 or fewer records);
use a format that includes abstracts (e.g., format 5).
Note any precision failures
and consider possible recall failures in each search.
You may wish to iterate the searches
to find some of the latter.
Submit your search results
for your two original searches.
Include the query, the two search strategies,
and the captured records for each strategy.
In about 500 words,
note any precision failures
and discuss differences in performance,
both precision and recall, between the two approaches
- controlled versus uncontrolled vocabulary.
- When searching Dialog,
do not apply field limiters to sets:
results cannot be relied upon.
- Note precision failures specifically enough
so that the particular documents or meanings referred to
Post your query, the two search strategies,
and the proportion of the records printed for each
that were relevant.
Comment on the search strategies posted by one other student.
Your preliminary facet question for the course project
should reach the instructor by March 12.
Find 10 HTML files with related subject matter.
Use XRefHT32 to extract titles and any targets
(named anchors) and metatagged keywords.
Using the indexing window of XRefHT32,
assign 1-3 index terms to each file;
convert the results to an HTML index.
Give the URL and title for each of the 10 files;
append a brief annotation
if the title is insufficiently specific or accurate.
Comment on the usefulness of titles
and any targets and metatagged keywords
(and, if you wish, headings)
in suggesting suitable index terms.
If titles or targets were not found,
discuss whether they should have been present.
(Maximum 500 words of discussion.)
(The grade for your report will also be based partly
on the quality of your posted index.)
- Remember to clarify the content of each file
if the title is absent or insufficient.
Upload your HTML index to the "Student Uploads" document library
on the course SharePoint site,
using your name as the filename.
Comment briefly on the index uploaded by one other student.
Use SKY Index Professional Demo to generate a printable index
to the descriptions
in the file 677sky.htm,
changing wording as appropriate
while maintaining the essential subjects.
Submit your index in word processor format
(e.g., RTF for WordPad/write.exe).
In about 500 words, note its good and bad features.
Take explicit account
of the criteria of predictability, collocation, clarity,
succinctness, and eliminability.
At the same time,
consider how the index entries would fit
into a cumulative index to items on similar topics.
- The discussion is the more important part
of this assignment.
How you decide to index the descriptions
is fairly open-ended.
- Be sure to include locators,
such as "1", "2", and so on,
in your index,
to show which description has been indexed
by each entry.
- Mention explicitly all the criteria
listed in the assignment,
adding any others you think appropriate.
- Make sure that you understand what is meant
by each of the listed criteria.
Collocation is the placing of similar entries together
and the separation of dissimilar entries.
Eliminability means how quickly a searcher can break off
examining an irrelevant entry.
Comment on a good or bad feature (or lack of a feature) of SKY Index.
Comment on some aspect of the sample NEPHIS index
in the file 677nepan.htm
Prepare an abstract of the article 677abs.htm
(in the "Shared Documents" document library
on the SharePoint site).
Follow the ANSI/NISO standard for abstracts
in the file 677absin.htm.
Submit your abstract.
Indicate briefly where the abstract is indicative
and where it is informative,
according to the definitions
- Because of the brevity of what you hand in,
every grammatical, spelling, and punctuation error
will count in this assignment.
- Remember to follow the definitions
for "informative" and "indicative"
given in the file 677absin.htm.
What are the key concepts in this article
that should be covered by any abstract?
Comment on two of the article abstracts
in the file 677absan.htm
Select either an index or other product of indexing
or an abstract journal
or other collection of abstracts,
either in printed form or online or ondisk,
that is of interest to you and to which you have access.
Selectively examine the quality
of either the indexing or the abstracting.
Write a report of about 1000 words
on the type and quality
of either indexing or abstracting
found in the product you examined.
Point out problems and good ideas.
- Remember to deal with either indexing or abstracting,
on an interesting feature of an indexing or abstracting product
that you have examined.
Make a brief original comment of your choice
relating to one of the topics of the course.
Final thesaurus project reports
should be received by the instructor by 9 am on April 9.
Last updated September 26, 2007.
This page maintained by
Prof. Tim Craven
E-mail (text/plain only): firstname.lastname@example.org
Faculty of Information and
University of Western
Canada, N6A 5B7