Use of Web Page Titles in Lists of Links

(Unpublished paper, 2003)

Timothy C. Craven
Faculty of Information and Media Studies
Middlesex College,
The University of Western Ontario,
London, Ontario N6A 5B7
Canada

Abstract

In a follow-up to an earlier study of Web bibliographies, 34 less formal lists of links were examined for uses of two different link texts: (1) the tagged title of the page linked to; (2) the title as it would appear to be from viewing the beginning of the page in the browser (apparent title). In all but 3 lists, the apparent title was more likely to be used than the tagged title. The median proportion of links using the tagged title was 21.5%, versus 42.8% for the apparent title. Convenience of copying may partly explain the preference for the apparent title. Longer lists appeared somewhat more inclined to use page titles as link texts. Use of exact page titles did not correlate significantly with proportion of links in the list that were in fact valid, suggesting that time since links were last checked is not the main factor in determining exactness of match between link texts and page titles.

Introduction

A recent study of 16 Web bibliographies (Craven, 2002), found that the proportion of titles derived from the title element in pages' HTML code was much less than that of apparent titles (titles as they would appear to be from viewing the beginning of the page in the browser); only rarely did the bibliography title match the tagged title and not the apparent title. This practice is contrary to the advice expressed by Land (2001), who says that the title element should be used, with the option of adding the contents of a page heading as a subtitle, if substantially different. On the other hand, the observed trend agrees with the view of Estivill and Urbano (1997), who recommend that the title element be used as the title only if an apparent title is missing. As noted in the report of the Web bibliography study, such advice one way or the other is, in fact, very rare, and other authors, including those in the area of library cataloguing, tend to be vague even when they mention the question of the title's origin.

The Web bibliography study was restricted to lists in which most items appeared to be in some standard bibliographic format, with bibliographic elements, such as title, author(s), and date, clearly distinguishable. Such lists appear to be relatively unusual when compared to more informal lists of links. Among contributing factors may be the difficulty of obtaining even somewhat complete bibliographic information for a typical Web page; for example, the date on the page may not match that of its content, no date may be stated at all and the server may not provide a file date when delivering the file.

It was therefore decided, for the current study, to extend the scope to the more common, informal type of link list. More specifically, the main question addressed was the extent to which such lists used either kind of title as the linking text; that is, as the text on which the user could click in the browser and be taken to the page indicated.

Methodology

A research assistant used the Google search engine (http://directory.google.com) to search for pages matching the query "bibliography links". Beginning with the first page, the assistant visited each retrieved page in turn, looking for lists that satisfied the following criteria.

  1. The list was given in HTML format (not Word, RTF, etc.).
  2. The list was not a search engine interface where the user had to enter a query, though it could be spread over several Web pages and one or two links might have to be followed from the original page to get to the actual page(s) of the list.
  3. Each item in the list was in a separate paragraph, list item, or the like.
  4. A title or title-like phrase was associated with each item. Other information, such as annotations, might or might not also be present.
  5. When duplicates are eliminated, at least 30 items in the list had valid links to HTML versions of the items to which they refer (not just links to other formats, such as PDF, or to publishers' home pages, abstracts, or the like).

For each list chosen, the following were recorded:

  1. URL;
  2. bibliographic data (title, author, date, publisher, etc.) to the extent those could be determined;
  3. total number of items in the list;
  4. number of items with links;
  5. number of links attempted in order to reach 300 valid links (or all links in the bibliography, whichever was less);
  6. number of valid links followed.

For each link attempted, the assistant recorded the following in a file corresponding to the list from which the link was derived:

  1. text associated with the link (text enclosed between <a> and </a> tags);
  2. text of sentence or sentence-like passage in which the link was embedded;
  3. the tagged title on the page;
  4. the apparent title of the page;
  5. the subtitle as it would appear to be from viewing the beginning of the page in the browser;
  6. match categories (as described below) for the tagged title;
  7. match categories for the apparent title.

The following mutually exclusive match categories were used:

Results

The number of items in the lists ranged from 39 to 1597, with a median of 85. Only 23 out of a total of 4937 items, or less than ½ %, were not links, and 10 of these came from a single list. The proportion of links attempted that proved valid ranged from 50.6% to 100.0%, with a median of 77.3%. For exact matches on the link text, proportions for tagged titles ranged from 8.1% to 71.4%, with a median of 21.5%; proportions for apparent titles ranged from 12.7% to 64.5%, with a median of 42.8%. In all but 3 lists, the proportion for the apparent title was greater than that for the tagged title. For total mismatches on the link text, proportions for tagged titles ranged from 0.0% to 24.4%, with a median of 9.6%; proportions for apparent titles ranged from 0.0% to 22.0%, with a median of 6.9%.

As a test of the extent to which datedness of a list might account for the degree of match between link texts on the page and current page titles, proportion of valid links was graphed against both proportion of exact matches to tagged titles and proportion of exact matches to apparent titles. Both graphs showed no significant association.

A two-by-two comparison of counts of shorter/longer lists against lower/higher exact matches for tagged titles using a chi-square test showed a marginally significant association (p=0.0164, df=1). A similar, but weaker association was also observed between list length and proportion of exact matches for apparent titles (p=0.3035, df=1).

Discussion

It might be expected that the rate at which link texts, which might or might not be intended to represent page titles, match page titles is somewhat lower than that for elements of more formal bibliographic entries. In the earlier study, the rates for apparent titles had ranged from around 60% to around 95%; and those for tagged titles, from around 10% to nearly 60%. The range for apparent titles in the present study (around 15% to 65%) does appear to be lower, while a difference for tagged titles (around 10% to 70%) is less evident.

As noted in reporting the earlier study, the convenience factor should probably not be ignored as a possible explanation for at least some part of the preference for the apparent title. Popular Web browsers, including Netscape Navigator, Internet Explorer, and Opera, all make it fairly easy to copy text from the main display. To copy a tagged title, however, is more involved. Typically, one must call up a view of the source code, find the tagged title within that source, and only then copy the title. To the beginner, it may even not be clear (as in the case of some versions of Navigator) how text can be copied from the source. Addition of features to browsers to allow easier capture of obscured or hidden page elements, such as tagged titles and metatags, might have a substantial effect on future preferences.

When a Web page list is compiled directly from a hot list ("favorites" in Internet Explorer, "bookmarks" in Netscape Navigator or Opera), the result by default will in fact be that the tagged titles are used. It would appear, however, that this method is employed relatively rarely; there was certainly no indication of its having been applied in constructing any of the lists studied.

Some possibility of bias existed in the present study, since the research assistant who extracted the apparent title and subtitle from a Web page had already seen the entry in the list. A more rigorous methodology might have involved two assistants, one collecting the list items and links and the other subsequently examining the pages referenced. Having another research assistant revisit the pages at a later date might in any case be an interesting follow-up.

In the area of follow-up, future work might address the relative stability of tagged and apparent titles. If tagged titles turned out to be substantially more constant over time, that might be an argument in favor of employing them in citations, in spite of their other disadvantages in comparison with apparent titles.

Given the vagueness of the standard cataloguing rules with regard to the chief source of information for the title of a computer files, a study of actual cataloguing practice might be of interest. Do cataloguers of Web pages in fact tend to take titles from the main display window? In the few cases where information in the main display window is less complete than in the tagged title, is the tagged title preferred, or is some other information source employed?

Acknowledgments

Research reported in this article was supported in part by the University of Western Ontario Office of Research Services with funds provided by the Natural Sciences and Engineering Research Council of Canada.

The extensive assistance of research assistant Craig Huffman in data gathering is also acknowledged.

References

Home

Last updated January 25, 2008, by Tim Craven