LIS 523/5 - Indexing

General Information

A Web site may be indexed in several different ways:
locally (internal or on-site) remotely (external or off-site)
for a search engine    
in HTML files similar to printed indexes    

Local Indexing

For a Search Engine

A number of server software packages, such as Microsoft FrontPage Extensions, include a keyword search capability. This is typically not implemented by default, but has to be turned on, either for the whole site or for specified directories.

For more information on choosing a local search tool for your site, see LIS 523/5 - Search Tools.

HTML

Most sites only have a table of contents, if that, but an HTML index that looks like a traditional back-of-the-book index can be helpful, providing browsability and a lot of control over the kinds of access points available. The index needs to be maintained so that it is up to date.

If a professional indexer is hired, creating the index might cost $7-$10 per page, depending on page length, depth of indexing, and so on. Additional costs would be incurred for updates.

Assistance can be provided by indexing software. For example, HTML Indexer automatically reads over a Web site and generates initial entries for all the pages and named anchors. The human indexer then needs to check through, modify the default text for each entry, and make additional entries as needed. Index information can also be imbedded in each page.

Remote Indexing

You can submit your site information to various search services individually or use a company or software package that will submit this information, en masse, to most of the popular search engines (there are some limited free services or software packages; fees for other services or packages range from US$10 to over US$100; some Web hosting packages include submission to major search engines).

HTML

Some search services, such as Yahoo!, include human classified listings of submitted pages. These are difficult to keep up to date, however, and it may take a while for a submitted page to be classified. Many pages are never classified at all.

For Search Engines

Indexing varies among search engines and their individual spidering programs, both in what pages are indexed and what features of these pages are indexed. In any case, indexing is by extraction; so the exact words and phrases used on the pages are critical. Here are some questions to ask about a search engine.

Note that you may be able to submit your site to a remote search engine and establish a customized connection to allow visitors to search your site specifically through that search engine. Examples are the Google SiteSearch service (this requires you to agree that Google will be the exclusive provider of search services on your site) and FreeFind.com.

For More Information

Local

HTML Indexes

Examples

For links to many more examples, see

Search Engines

(See LIS 523/5 - Search Tools.)

Remote

Search Engines


Home

Last updated April 20, 2007.
This page maintained by Prof. Tim Craven
E-mail (text/plain only): craven@uwo.ca
Faculty of Information and Media Studies
University of Western Ontario,
London, Ontario
Canada, N6A 5B7