Software supporting various abstracting assistance features, including thesaurus-based functions, is being prototyped in a text network management system, known as TEXNET [Craven, 1988] [Craven, 1991b; Craven, 1993a].
To suit the type of document and the task at hand, the current version of TEXNET allows the user to adjust a number of parameters. One class of these are those related to weighting of text segments; an example is the minimum length of extracts. Various parameters may also be set for thesaurus-related operations.
Each link type is identified by number. It may also be given a mnemonic (such as "BT"), which may be changed at will depending on the type of display desired. Pairs of link types may be set as inverses (for example, a "BT" and an "NT" link type); a link type may be its own inverse (for example, "RT").
In addition, each link type is assigned a weight. A link type that implies a close conceptual relationship, such a near-synonymy, will typically receive a higher weight than one that implies a more remote relationship. The primary purpose of these weights lies in "exploding" keywords, as explained below.
The graphic display shown in Figure 1 is derived from an explosion of the word "RESEARCH", which, in this case, is also a thesaurus term. In such graphic displays, different types of arrows are employed to indicate different types of link between terms. In this case, a solid double-headed arrow indicates a link between related terms; a solid single-headed arrows points from a broader term to a narrower term, and a broken single-headed arrow points from a non-preferred to a preferred term. The arrow styles to be used to represent any link type can be set or reset by the user.
Box style is used here to indicate the weight attached to a term in the explosion. A solid outline reflects the highest weights. It is obviously appropriate that this style should be used for the seed term, as it is for "RESEARCH" in the example. A dotted outline is applied to terms of middle weight; in the illustration, it appears around the immediate broader term "WORK". The remaining terms, with the lowest weights, are shown within dashed outlines. In the example, "SCIENCE", "SOCIAL SCIENCE", and "STUDIES" received lower weights than "WORK" because a lower weight had been assigned to the "RT" link type than to the "BT" link type.
More than one algorithm may be selected to position the terms in a graphic display. The figure shows the result of applying the default algorithm, which is an adaptation of one originally described by Watanabe [Watanabe, 1989]. One of the chief aims of the Watanabe algorithm is to bring together pairs of linked nodes and to separate pairs that are not linked. It also aims to avoid having the lines that indicate links cross over either one another or node boxes. Advantages of this algorithm include the production of a compact display for a large set of terms, provided the density of term links does not become too great at any point.
A notable disadvantage of the Watanabe-style algorithm, for thesaurus displays, as well as a number of other kinds of displays, is its failure to take account of link type directionality. In Figure 1, note how a narrower term may appear above ("ADMINISTRATION"), below ("TEACHING", "EMPLOYEES"), or to the left of ("RESEARCH") the corresponding broader term; it is largely chance that none appears to the right.
Two of the other algorithms available regard the directionality of certain linktypes very strongly. Each of these yields a much more obvious hierarchical layout, in which narrower terms always appear to the right of their broader terms. There are, however, tradeoffs: for one thing, the double-headed arrows that typically indicate related-term links can end up crossing boxes and other arrows to such an extent that the result is undecipherable.
What seemed desirable was some sort of hybrid, which would retain good features of the Watanabe-style display while providing the added user assistance of some degree of directionality. What appears to be the best such approach to date, one that has in fact been implemented as part of the package, will now be described.
It should first be understood that a major part of the Watanabe algorithm involves weighting various potential positions before placing a node. Position weights are determined first on the basis of position relative to any already placed nodes that are directly linked to the node to be placed. Many conceivable but unsuitable positions are in fact eliminated by this step. Second, account is taken of any new arrow crossovers that may result, with each position being penalized for each crossover that it would create.
An initial approach was to define, for each link type, an ideal position relative to the already placed linked node and to penalize each position proportionally to its distance from the ideal. While this approach produced potentially useful results, however, it was not considered entirely satisfactory. A major drawback was its tendency to produce less compact displays.
A second approach was therefore substituted. Here, the ideal is simply a pair of binary direction choices: up or down, left or right. A position on the ideal side is given a bonus, one on the opposite side is assigned a penalty, and one in the middle retains its original weight. As well as selecting the ideal directions, the user indicates a degree of preference for each. This degree of preference equals the amount of bonus or penalty that will be used in the position weighting.
A series of compass roses is displayed, one for each arrow style used. The user changes the direction and length of an arrow within a compass rose by "clicking on" the position of the outer end of the desired arrow. In the figure, the user has just changed the "<---" compass rose in this way.
Certain arrow styles are paired: changing the setting of one invokes an inverse change in the other, and both changes are immediately displayed in the corresponding compass roses.
Each compass-rose arrow may be understood as a vector. The sign of the vector's X component indicates the preference for a left or right direction; the absolute value, the degree of that preference. The Y component has a similar relation to the vertical direction.
Figure 3 shows an example of a display to which the directional biases of Figure 2 have been applied. (The term cut off at the top is "RESEARCHERS".) We can see that there is in fact a greater tendency for narrower terms to appear to the right than there was in the default format shown in Figure 1. Nevertheless, the relative placement of "PERSONNEL" and "WORKERS" clearly still goes against the user's preferences.
In Figure 4, the user has specified a stronger preference for an "easterly" placement of narrower terms. Now no narrower term is placed to the left of its corresponding broader term, though "PERSONNEL" is not actually to the right of "WORKERS".
What is the exact nature of the tradeoff involved in introducing directional biasing? Experience so far suggests crossovers may not increase significantly, but that display compactness may be degraded to some extent.
Is the increase in graph-drawing time significant? It is estimated that the additional calculation time involved is minor, and no indication of significant delays has been observed so far. But more specific figures might be desirable.
Given that the directional bias is only a bias, and not an overriding rule, are the results actually helpful to users? Might partial directionality introduce exaggerated expectations that could lead to misreading of displays? Research here could range all the way from asking respondents simple factual questions like "What are the narrower terms for 'WORKERS' in this display" to more sophisticated studies of effects on abstracts produced.
Might another form of interface be superior to the compass-rose display? For example, suppose users were permitted to drag boxes around in sample term-link displays until they found pleasing arrangements. Could such user-defined arrangements be employed towards automatic generation of appropriately biased displays of other term-link structures?
It might be useful to provide for some sort of dimensional preference in addition to the current directional preferences: that is, being able to specify that a term linked in a given way should be positioned above or below, but not to the left or right, of the term to which it is linked, or vice versa. This would be especially applicable to a link type like "RT" that is its own reciprocal.
Craven, T.C.: Text network display editing with special reference to the production of customized abstracts. Canadian journal of information science. 13 (1988) No.1/2, p.59-68.
Craven, T.C.: Algorithms for graphic display of sentence dependency structures. Information processing and management. 27 (1991b) No.6, p.603-613.
Craven, T.C.: A computer-aided abstracting tool kit. Canadian journal of information science. 18 (1993a) No.2, p.19-31.
Craven, T.C.: A thesaurus for use in a computer-aided abstracting tool kit. In: Bonzi, S.: ASIS '93: proceedings of the 56th ASIS Annual Meeting (1993, volume 30), Columbus, Ohio, October 24-28, 1993. Medford, New Jersey: Learned Information. 1993b. p.178-184.
Schmitz-Esser, W.: New approaches in thesaurus application. International classification. 18 (1991) No.3, p.143-147.
Watanabe, H: Heuristic graph displayer for G-BASE. International journal of man-machine studies. 30 (1989) No.3, p.287-302.
|The directional bias feature described in this paper is still available in TexNet32, the current version of TexNet, available for download at http://publish.uwo.ca/~craven/freeware.htm#texnet32|
Last updated January 24, 2008, by Tim Craven