LIS 523/5 - Indexing
A Web site may be indexed in several different ways:
(internal or on-site)
(external or off-site)
|for a search engine
|in HTML files similar to printed
For a Search Engine
A number of server software packages,
such as Microsoft FrontPage Extensions,
include a keyword search capability.
This is typically not implemented by default,
but has to be turned on,
either for the whole site or for specified directories.
For more information on choosing a local search tool for your
see LIS 523/5 - Search Tools.
Most sites only have a table of contents, if that,
but an HTML index
that looks like a traditional back-of-the-book index
can be helpful,
and a lot of control over the kinds of access points available.
The index needs to be maintained so that it is up to date.
If a professional indexer is hired,
creating the index might cost $7-$10 per page,
depending on page length, depth of indexing, and so on.
Additional costs would be incurred for updates.
Assistance can be provided by indexing software.
For example, HTML Indexer
automatically reads over a Web site
and generates initial entries for all the pages
and named anchors.
The human indexer then needs to check through,
modify the default text for each entry,
and make additional entries as needed.
Index information can also be imbedded in each page.
You can submit your site information
to various search services individually
or use a company or software package
that will submit this information, en masse,
to most of the popular search engines
(there are some limited free services
or software packages;
fees for other services or packages range from US$10
to over US$100;
some Web hosting packages
include submission to major search engines).
Some search services,
such as Yahoo!,
include human classified listings of submitted pages.
These are difficult to keep up to date, however,
and it may take a while for a submitted page to be classified.
Many pages are never classified at all.
For Search Engines
Indexing varies among search engines
and their individual spidering programs,
both in what pages are indexed
and what features of these pages are indexed.
In any case,
indexing is by extraction;
so the exact words and phrases used on the pages are critical.
Here are some questions to ask about a search engine.
- Does the spider crawl "popular" pages more
- Will the date of the file or indexing be shown?
- How soon will submitted pages be indexed?
- How soon are linked non-submitted pages indexed?
- Are all linked pages covered?
- Are frame links followed?
- Are links in client-side image maps followed?
- Can you give the spider a password
so that it can index password-protected pages?
- Does inclusion in the index depend on number of links
to the page?
- Does the search engine learn how often pages change
and visit them accordingly?
- How does the search engine know which pages not to
- If a user is redirected automatically from one URL to
does the search engine record the second URL?
- What stopwords are excluded?
- How are location and frequency used?
- How good is the search engine
at recognizing "word stuffing" (or
and other spam-like techniques?
- Are keywords, etc., in meta tags recognized?
- How are titles generated from Web pages?
- How are descriptions generated from Web pages?
- Are out-of-date links maintained
Note that you may be able to submit your site to a remote search
and establish a customized connection
to allow visitors to search your site specifically
through that search engine.
Examples are the Google SiteSearch service
(this requires you to agree that Google will be the exclusive
of search services on your site)
For More Information
- American Society of Indexers. 2007.
American Society of Indexers: Indexing the Web.
(Discusses back-of-the-book style site indexing,
and provides links to some indexed sites
and to further information on search engine technologies.)
- Browne, G.; Jermey, J. 2004.
Enhancing Access to Information within Websites, 2nd edition.
- Green, D.; Ash, J. 2006.
ITC332 Site Operations for Webmasters - Indexing.
Charles Sturt University.
(Explains using Perl to keyword index a site
and also discusses Harvest, robots and search engines, and metadata.)
- Stephenson, M.S. 2007.
School of Library, Archival and Information Studies -- UBC.
(Many links to articles and other sources of information.)
- Web Indexing SIG. 2007.
Web Indexing SIG.
(A special interest group of the American Society of Indexers.)
- Broccoli Information Management. 2006.
Web Site and Intranet Indexing by Broccoli Information Management.
(Promotional page for a professional indexing
and vocabulary control service.)
- Brown Inc. 2006.
a professional tool for indexing HTML files,
US$239.95, with free downloadable demonstration version.)
- Craven, T.C. 2007.
Tim Craven - Freeware.
(XRefHT32, a simple HTML index generator, free.)
- Hedden, H. 2005.
A-Z Web Site Indexes Explained >
How Alphabetical Indexes Work.
emphasizing advantage of A-Z indexes over use of search engines.)
- Lamb, J.A. 2006.
Website Indexes: Visitors to Content in Two Clicks.
(Further information: http://www.lulu.com/lamb/).
- Leise, F. 2002.
Improving Usability with a Website Index
- Boxes and Arrows: The design behind the design.
(A general introduction.)
For links to many more examples, see
(See LIS 523/5 - Search Tools.)
- Freefind.com. 2006.
Web Site Search Engine, Free and Pro Versions - FreeFind.com.
(Offers free site searching supported by advertising
and fee-based site searching without advertising.)
- Google. 2007.
Google Web Search and Site Search.
(Free site searching with advertisements
or for educational institutions,
and fee-based plans.)
- Richmond, A. 2007.
"WDVL: META tagging for search engines".
Web Developerís Virtual Library.
(Describes use of META tags for specifying to search engines
how you would like your document to be indexed.)
Last updated April 20, 2007.
This page maintained by
Prof. Tim Craven
E-mail (text/plain only): firstname.lastname@example.org
Faculty of Information and
University of Western
Canada, N6A 5B7