Journal of Geographic Information and Decision Analysis, vol.1, no.1, pp. 69-82 

Three Fastest Shortest Path Algorithms on Real Road Networks: Data Structures and Procedures

F. Benjamin Zhan
Department of Geography and Planning,  Southwest Texas State University,  San Marcos, TX 78666,  USA
fz01@swt.edu
http://www.swt.edu/~fz01/ 



ABSTRACT It is well known that computing shortest paths over a network is an important task in many network and transportation related analyses.  Choosing an adequate algorithm from the numerous algorithms reported in the literature is a critical step in many applications involving real road networks.  In a recent study, a set of three shortest path algorithms that run fastest on real road networks has been identified.  These three algorithms are: 1) the graph growth algorithm implemented with two queues, 2) the Dijkstra algorithm implemented with approximate buckets, and 3) the Dijkstra algorithm implemented with double buckets.  As a sequel to that study, this paper reviews and summarizes these three algorithms, and demonstrates the data structures and procedures related to the algorithms.  This paper should be particularly useful to researchers and practitioners in transportation, GIS, operations research and management sciences.
KEYWORDS: shortest path algorithms, GIS, transportation, networks

Acknowledgments This research was supported in part by a Faculty Research Enhancement Grant from Southwest Texas State University and in part by a copy of UNIX Arc/Info from Environmental Systems Research Institute, Inc (ESRI). The author also wishes to thank Boris V. Cherkassky, Andrew V. Goldberg and Tomasz Radzik for making their one-to-all shortest paths C source codes available. 



 Contents
1. Introduction
With the development of geographic information systems (GIS) technology, network and transportation analyses within a GIS environment have become a common practice in many application areas.  A key problem in network and transportation analyses is the computation of shortest paths between different locations on a network.  Sometimes this computation has to be done in real time.  For the sake of illustration, let us have a look at the case of a 911 call requesting an ambulance to rush a patient to a hospital.  Today it is possible to determine the fastest route and dispatch an ambulance with the assistance of GIS.  Because a link on a real road network in a city tends to possess different levels of congestion during different time periods of a day, and because a patient's location can not be expected to be known in advance, it is practically impossible to determine the fastest route before a 911 call is received.  Hence, the fastest route can only be determined in real time.  In some cases the fastest route has to be determined in a few seconds in order to ensure the safety of a patient.  Moreover, when large real road networks are involved in an application, the determination of shortest paths on a large network can be computationally very intensive.  Because many applications involve real road networks and because the computation of a fastest route (shortest path) requires an answer in real time, a natural question to ask is: Which shortest path algorithm runs fastest on real road networks?
        Although considerable empirical studies on the performance of shortest path algorithms have been reported in the literature (Dijkstra 1959; Dial et al. 1979; Glover et al. 1985; Gallo and Pallottino 1988; Hung and Divoky 1988; Ahuja et al. 1990; Mondou et al. 1991; Cherkassky et al. 1993; Goldberg and Radzik 1993), there is no clear answer as to which algorithm, or a set of algorithms runs fastest on real road networks.  In a recent study conducted by Zhan and Noon (1996), a set of three shortest path algorithms that run fastest on real road networks has been identified.  These three algorithms are: 1) the graph growth algorithm implemented with two queues, 2) the Dijkstra algorithm implemented with approximate buckets, and 3) the Dijkstra algorithm implemented with double buckets.  As a sequel to that study, this paper reviews and summarizes these three algorithms, and demonstrates the data structures and implementation strategies related to the algorithms.
        The rest of the paper is organized as follows.  Recent evaluations, particularly the evaluation of 15 shortest path algorithms using real road networks, are briefly reviewed in Section 2.  Network representation, the labeling method and data structures related to shortest path algorithms in general are reviewed in Section 3.  The graph growth algorithm implemented with two queues is described in detail in Section 4.  The approximate and double bucket implementations of the Dijkstra algorithm are reviewed in Section 5.  Concluding remarks are given in Section 6.

2. Recent Evaluations of Shortest Path Algorithms

A network is defined as a directed graph G = (N, A) consisting of a set N of nodes and a set A of arcs with associated numerical values, such as the number of nodes, n=|N|, the number of arcs, m=|A|, and the length of an arc connecting nodes i and j, denoted as l(i,j).  The shortest path problem can be stated as follows: given a network, find the shortest distances (least costs) from a source node to all other nodes or to a subset of nodes on the network.  These shortest paths represent a directed tree T rooted from a source node s with the characteristic that a unique path from s to any node i on the network is the shortest path to that node (Ahuja et al. 1993).  The length of the shortest path from s to any node i is denoted as d(i).  This directed tree is called a shortest path tree.  For any network with n nodes, one can obtain n distinctive shortest path trees.  Shortest paths from one (source) node to all other nodes on a network are normally referred as one-to-all shortest paths.  Shortest paths from one source node to a subset of the nodes on a network can be defined as one-to-some shortest paths.  Shortest paths from every node to every other node on a network are normally called all-to-all shortest paths.

2.1 Cherkassky et al.'s Evaluation

Although there have been a number of reported evaluations of shortest path algorithms in the literature (e.g., Glover et al. 1985; Gallo and Pallottino 1988; Hung and Divoky 1988), a recent study by Cherkassky et al. (1993) is one of the most comprehensive evaluations of shortest path algorithms to date.  They evaluated a set of 17 shortest path algorithms.  In their experiment, Cherkassky et al. coded the 17 algorithms using the C programming language, and tested the C programs on a SUN Sparc-10 workstation.  One-to-all shortest paths can be computed by these C programs.  Readers are referred to the Cherkassky et al. (1993) paper for more detailed descriptions about the implementation of the algorithms.  Cherkassky et al. used a number of simulated networks with various degrees of complexity for evaluating the algorithms.  The results of their studies suggest that no single algorithm performs consistently well on all simulated networks.

2.2 Zhan and Noon's Evaluation

More recently, Zhan and Noon (1996) tested 15 of the 17 shortest path algorithms using real road networks.  In their evaluation, Zhan and Noon dropped two of the 17 algorithms tested by Cherkassky et al.  They did not consider the special-purpose algorithm for acyclic networks because an arc on real road networks can be treated bi-directional, and hence real road networks contain cycles.  They also dropped the implementation using a stack to maintain labeled
Table 1  Summary of the 15 Algorithms Evaluated
 Abbreviation
             Implementation
     BFM
Bellman-Ford-Moore
     BFP
Bellman-Ford-Moore with Parent--checking
     DKQ
Dijkstra's Naive Implementation
     DKB
Dijkstra's Buckets -- Basic Implementation
     DKM
Dijkstra's Buckets -- Overflow Bag
     DKA
Dijkstra's Buckets -- Approximate
     DKD
Dijkstra's Buckets -- Double
     DKF
Dijkstra's Heap -- Fibonacci
     DKH
Dijkstra's Heap -- k--array
     DKR
Dijkstra's Heap -- R--Heap
     PAP
Graph Growth -- Pape
     TQQ
Graph Growth with Two Queues -- Pallottino
     THR
Threshold Algorithm
     GR1
Topological Ordering -- Basic
     GR2
Topological Ordering -- Distance Updates
 
nodes (see the next section for descriptions about stack and labeled nodes) because they found that this algorithm is many times slower than the rest of the algorithms on real road networks during their preliminary testing.  These 15 algorithms are summarized in Table 1.  It is not the intention of this paper to review these 15 algorithms thoroughly.  Detailed description of the algorithms can be found in Cherkassky et al. (1993) and the references therein.
        In their evaluation, Zhan and Noon used 21 real road networks for evaluating the shortest path algorithms.  These 21 networks included the U.S. National Highway Planning Network (NHPN) covering the continental U.S. and 20 state-level road networks generated from road networks in 10 states in the Midwest and Southeast of the United States.  The 10 states are Alabama (AL), Florida (FL), Georgia (GA), Iowa (IA), Louisiana (LA), Minnesota (MN), Missouri (MO), Mississippi (MS), Nebraska (NE), and South Carolina (SC).  The 20 state-level road networks are composed of 10 low-detail road networks and 10 high-detail road networks.  The 10 low-detail networks contain three levels of roads, including interstate highways, principal arterials and major arterials.  The 10 high-detail networks consist of one additional level of more detailed roads in addition to the three levels of roads contained in the low-detail networks.  The 21 networks were stored and maintained in Arc/Info GIS running on a SUN Sparc-20 workstation under the Solaris 2.4 environment.  The nodes, arcs and arc lengths were downloaded from Arc/Info into ASCII files.  Before downloading, a check was made to ensure that the networks were fully connected.
        A summary of the 21 networks used in Zhan and Noon's evaluation is given in Table 2. One important characteristic of a real road network is the degree of connectivity measured by the arc-to-node ratios.  It can be seen in Table 2 that the arc-to-node ratios range from 2.66 to 3.28 in the 21 networks.  The degree of connectivity in these 21 networks differ considerably from that of simulated networks where the arc-to-node ratios can be as high as 10 (cf., Gallo and Pallottino 1988).  In addition, there is no notable difference in the degrees of connectivity in all 21 networks.  Because the number of scans in constructing a shortest path tree is directly related to arc-to-node ratios, it is very important to observe this difference between the arc-to-node ratios in real road networks and simulated networks.
        The 15 algorithms were coded in the C programming language. The C programs were based on the set of one-to-all shortest path C programs provided by Cherkassky et al. (1993).  The set of one-to-all shortest path C codes were modified to automatically generate all-to-all shortest paths.  The C programs were compiled with the gcc compiler version 2.5.6 using the O4 optimization option.  Zhan and Noon's experiments were conducted on a SUN Sparc-20 workstation (model HS21 with a 125MHz Hypersparc processor and 64 Megabytes of RAM running under the Solaris 2.4 environment).  More detailed description about the experiments can be found in Zhan (1995) and Zhan and Noon (1996).

2.3 Three Fastest Algorithms on Real Road Networks

Based on their evaluation, Zhan and Noon suggested that the best performing implementation for solving the one-to-all shortest path problem is Pallottino's graph growth algorithm implemented with two queues (TQQ).  They further suggested that when the goal is to obtain a one-to-one shortest path or one-to-some shortest paths, the Dijkstra algorithm offers some advantages because it can be terminated as soon as the shortest path distance to the destination node is obtained (see Section 5).  Zhan and Noon recommended two of Dijkstra implementations.  The choice between the two implementations depends on the maximum network arc lengths.  They recommended the approximate buckets implementation of the Dijkstra algorithm (DKA) for computing one-to-some shortest paths over networks whose maximum arc length is less than 1500.   For networks whose maximum arc length is greater than 1500, they recommended that the double buckets implementation of the Dijkstra algorithm (DKD) should also be considered.
 
Table 2  Summary of the 21 real road networks used in the evaluation
No.
state
number    of nodes
number of arcs
arc/node ratio
maximum arc lengt
mean arc length
stnd. dev. of arc lengths
 1
NE
 523
  1646
3.14
0.874764
0.215551
0.142461
 2
AL
 842
  2506
2.98
0.650305
0.128870
0.114031
 3
MN
 951
  2932
3.08
0.972436
0.175173
0.132083
 4
IA
1003
  2684
2.68
0.573768
0.119900
0.113719
 5
MS
1156
  3240
2.80
0.498810
0.095443
0.100703
 6
SC
1784
  5128
2.88
0.413163
0.062156
0.064389
 7
FL
2155
  6370
2.96
0.923088
0.075247
0.076590
 8
MO
2391
  7308
3.06
0.494730
0.090977
0.064761
 9
LA
2437
  6876
2.82
1.021526
0.060662
0.067557
10
GA
2878
  8428
2.92
0.478579
0.068333
0.005668
11
LA
35793
 98880
2.76
0.360678
0.013874
0.015297
12
MS
39986
120582
3.02
0.232062
0.015412
0.014000
13
NE
44765
146476
3.28
0.528283
0.018039
0.015652
14
FL
50109
133134
2.66
0.416212
0.011207
0.015264
15
SC
52965
149620
2.82
0.163557
0.009975
0.010198
16
IA
63407
208134
3.28
0.269823
0.015733
0.009220
17
MN
65491
209340
3.20
0.410925
0.017202
0.014107
18
AL
66082
185986
2.82
0.298232
0.011383
0.012410
19
MO
67899
204144
3.00
0.212470
0.015542
0.013266
20
US
75417
205998
2.74
1.500361
0.066084
0.094758
21
GA
92792
264392
2.84
0.174245
0.010511
0.000107
Note:   The first 10 networks are low-detail road networks (three levels of roads) from the ten states.  The remaining 11 networks are the 10 high-detail road networks (four levels of roads) from the ten states plus the US National Highway Planning Network (US).  The networks are ordered by the number of nodes (After Zhan and Noon 1996).

3. Network Representations, the Labeling Method and Data Structures Related to Shortest Path Algorithms

3.1 Network Representation

The way in which an input network is represented and implemented in a shortest path algorithm is vital to the performance of the algorithm.  Past research has proven that the forward star representation is the most efficient data structure for representing networks (Gallo and Pallottino 1988; Ahuja et al. 1993 p.35-36; Cherkassky et al. 1993).  Two sets of arrays are used in the forward star data structure.  The first array is used to store data associated with arcs, and the second array is used to store data related to nodes.  All arcs of a network in question are maintained in a list and are ordered in a specific sequence.  That is, arcs emanating from nodes 1, 2, 3, ..., are ordered sequentially.  Arcs emanating from the same node can be ordered arbitrarily, however.  All information associated with an arc, such as starting node, ending node, cost, arc length and capacity are stored with the arc in some way (e.g., corresponding arrays or linked lists).
        For the array of nodes, a total of n+1 elements are needed. The i-th element associated with node i, pointer(i), stores the sequential number (in the above arc list) of the first arc emanating from node i.  There are a few exceptions: 1) for a node i that has no outgoing arc, pointer(i) is set equal to the content of the next element in the array, i.e., pointer(i) = pointer(i+1); and 2) for consistency, the following convention is adopted, i.e., pointer(1)=1 and pointer(n+1)=m+1.

3.2 The Labeling Method

The labeling method is a central procedure in most shortest path algorithms (Gallo and Pallottino 1988; Ahuja et al. 1993, p.96).  The output of the labeling method is an out-tree from a source node, s, to a set of nodes.  This out-tree is constructed iteratively, and the shortest path from s to i is obtained upon termination of the method.  Three pieces of information are maintained for each node i in the labeling method while constructing a shortest path tree: the distance label, d(i), the parent node, p(i), and the node status, S(i).  The distance label, d(i), stores the upper bound of the shortest path distance from s to i during iteration.  Upon termination of an algorithm, d(i) represents the unique shortest path from s to i.  The parent node p(i) records the node that immediately precedes node i in the out-tree.  The node status, S(i), can be one of the following: unreached, temporarily labeled and permanently labeled.   When a node is not scanned during the iteration, it is unreached.  Normally the distance label of an unreached node is set to positive infinite.  When it is known that the currently known shortest path of getting to node i is also the absolute shortest path we will ever be able to attain, the node is called permanently labeled.  When further improvement is still expected to be made on the shortest path to node i, node i is considered only temporarily labeled.  It follows that d(i) is an upper bound on the shortest path distance to node i if the node is temporarily labeled; and d(i) represents the final and optimal shortest path distance to node i if the node is permanently labeled.
        At the beginning of the iterations in the labeling method, a directed out-tree is initialized and the initial values of the above parameters d(i), p(i) and S(i) are set for source node s and every other node i accordingly (Ahuja et al. 1993).  During the scanning process, when a node i is scanned, the distance label of a successor node j is checked and an attempt is made to lower the distance label, d(j),  of node j.  If d(j) can be lowered, the out-tree is updated by changing the parent node of j to i, that is, p(j) = i.  Because d(j) is lowered, node j should ultimately become permanently labeled.  The iteration continues until all nodes become permanently labeled.  Upon termination of the iterations, the out-tree becomes a shortest path tree.  Formally, the scanning operation for node i can be described below.
           Procedure ScanningOperation(i)
             begin
               for all successor nodes of i do
                 if d(i) + l(i,j) < d(j) then
                   begin
                      d(j) = d(i) + l(i,j);
                      p(j) = i;
                      S(j) = labeled;
                   end
                S(i) = permanently labeled;
             end

3.3 Selection Rules and Data Structures

The performance of a particular shortest path algorithm partly depends on how the basic operations in the labeling method are implemented.  Two aspects are particularly important to the performance of a shortest path algorithm: 1) the strategies used to select the next temporarily labeled node to be scanned, and 2) the data structures utilized to maintain the set of labeled nodes.  We briefly review these two aspects in this subsection.  Readers can refer to Gallo and Pallottino (1988) and Ahuja et al. (1993) for more detailed discussions on these topics.
        Strategies commonly used for selecting the next temporarily labeled node to be scanned are "First In First Out" (FIFO), "Last In First Out" (LIFO) and  "Best-First-Search" (Gallo and Pallottino 1988).  It is fairly easy to see from the names of the first two search strategies that the oldest node in the set of temporarily labeled nodes is selected first in a FIFO search strategy and the newest is selected first in a LIFO strategy at each iteration. In the best-first-search strategy, the node with the minimum distance label from the set of temporarily labeled nodes is considered as the best node.
        A number of data structures can be used to manipulate the set of temporarily labeled nodes in order to support these strategies.  These data structures include arrays, singly and doubly linked lists, stacks, buckets and queues.  Detailed definitions and operations related to these data structures are standard knowledge and are well documented in the literature (e.g., Sedgewick 1990; Ahuja et al. 1993, pp.765-787).  Therefore, we only selectively review some of them.  A  singly linked list contains a collection of elements.  Each element has a data field and a link field.  The data field contains information to be stored, and the link field contains a pointer pointing to the next element in the list.  A doubly linked list differs from a singly linked list in that each element in a doubly linked list contains two pointers.  One pointer points to the previous element in the list, and another pointer points to the next element in the list.  Stack is another special type of list which only allows removal and addition of an element at one end of the list.  This end of the list is normally called the top of a stack.  The bucket data structure is described in detail in Section 5 because it is related to two of the three recommended algorithms, namely, the approximate and double bucket implementations of the Dijkstra algorithms.
        A queue is a special type of list which allows the addition of an element at the tail and the deletion of an element at the head. A priority queue is a special type of queue.  Each element in a priority queue contains a label (normally a numerical value) that can be used to determine the priority of the element in the queue.  Three operations are normally defined in a priority queue: adding a new element, removing the element that has the highest priority in the queue, and correcting the label of an element whose location in the queue is known.  When the label of an element in a priority queue is set to the distance label of a node, a priority queue can be used to maintain the set of temporarily labeled nodes efficiently.  Therefore, a priority queue is often used to implement the best-first search strategy.  A priority queue can be implemented by linked lists, binary-heaps, d-heaps and Fibonacci heaps (Ahuja et al. 1993).  The deque and two queue data structures described in the next section are particular types of priority queues which are related to the graph growth algorithm implemented with two queues.

4. The Graph Growth Algorithm Implemented With Two Queues

We describe the data structures and basic procedures related to the graph growth algorithm implemented with two queues in this section.  The two bucket implementations of the Dijkstra algorithm are described in the next section.  The graph growth algorithm implemented with two queues (TQQ) was introduced by Pallottino in 1984.  TQQ is an improved version of the growth graph implementation developed by Pape (PAP) in 1974.  Before we discuss these two implementations, let us review the basic procedure in constructing a shortest path tree as shown below (see, e.g., Pallottino 1984, p.259).
           Procedure ShortestPathTreeConstruction(s)
               begin
                 Queue_Initialization(Q);
                  for i=1 to n do
                    d(i) = + infinite;
                    d(s) = 0;
                     while (Q != Null) do
                         Queue_Removal(Q, i);
                         for each successor node j of node i do
                           if d(j) > d(i) + l(i, j) then
                              begin
                                 d(j) = d(i) + l(i, j)
                                 Queue_Insertion(Q, j)
                              end
                end

The four basic operations involved in this procedure are:
    Queue_Initialization(Q)   initialize queue Q;

    Queue_Removal(Q, i)       remove node i from queue Q;

    Queue_Insertion(Q, j)     insert node j into queue Q; and

    Q = Null?                 check whether queue Q is empty.

        The major difference between TQQ and PAP is in the Queue_Insertion(Q, j) operation.  In the implementations of PAP and TQQ, nodes are partitioned into two sets: the first set of nodes are those nodes whose current distance labels have not already been used to find a shortest path and the second set contains the remaining nodes.  The first set of nodes is maintained by a priority queue Q.   Nodes in the second set are further split into two categories: 1) the unreached nodes which have never entered Q, i.e., nodes whose distance labels are still infinite, and 2) labeled nodes, i.e., the nodes that have passed through Q at least once, and the nodes whose current distance labels have already been used.
        Pape (1974) used a data structure called deque (Q) to maintain the first set of nodes in Q. A deque is illustrated in Figure 1 (Pallottino 1984, p.261).  A deque allows insertions at either end of the queue.  In the PAP implementation, the deque consists of a LIFO stack (S) and a FIFO queue (Q').  For any node that is not already in Q, the node is inserted at the end of Q' if it is unreached; or the node is inserted at the beginning of S if it is temporarily labeled.  Therefore, the basic operations in the PAP implementation can be summarized below:

    Queue_Initialization(Q)   initialize queue Q;

    Queue_Removal(Q, i)       remove node i from the beginning of queue Q,
                              i.e., the top of stack S;

    Queue_Insertion(Q, j)     For any node j that is not already in Q,
                              insert the node at the end of Q' if the
                              node is unreached, i.e., if S(j) = unreached
                              or insert the node at the beginning of S
                              if the node is temporarily labeled; and

    Q = Null?                 check whether queue Q is empty.
 

Figure 1  The deque Q as a pair of stack (S) and queue (Q')
Figure 1 The deque Q as a pair of stack (S) and queue (Q') (after Pallottino 1984, p.261).

        Because a stack is used as a priority queue in the PAP implementation, PAP has an exponential worst-case complexity with respect to the number of nodes, i.e., O(n2^n).  A logical enhancement of the PAP algorithm is to replace the LIFO stack with a FIFO queue and construct a new data structure.  This new data structure is called two-queue (Figure 2).  Because both Q' and Q" are queues in the two-queue data structure, nodes can be inserted at the end of Q' and Q", and they can be removed from the head of Q' and Q".
 

 
Figure 2  The two-queue data structure (Q) consisting of Q" and Q'  (after Pallottino 1984, p.264).

        It follows that for any node that is not already in Q, the node is inserted at the end of Q' if it is unreached, or the node is inserted at the end of Q" if it is temporarily labeled.  This leads to the following change in the Queue_Insertion(Q, j) operation of the PAP implementation (Pallottino 1984, p.264).  Other operations remain the same.
    Queue_Insertion(Q, j)     For any node j that is not already in Q,
                                                insert the node at the end of Q' if the
                                                node is unreached, i.e., if S(j) = unreached
                                                or insert the node at the end of Q"
                                                if the node is temporarily labeled.
 

5. The Dijkstra's Algorithm Implemented With Approximate and Double Buckets

The original Dijkstra algorithm partitions all nodes into two sets: temporarily and permanently labeled nodes.  At each iteration, it selects a temporarily labeled node with the minimum distance label as the next node to be scanned (Dijkstra 1959; Ahuja et al. 1993, p.109).  Once a node is scanned, it becomes permanently labeled.  The Dijkstra algorithm terminates when all nodes become permanently labeled.  The Dijkstra algorithm is similar to the procedure for constructing a shortest path tree described in Section 4 except for the differences mentioned above.  Therefore, detailed procedure of the Dijkstra algorithm is not described further in this paper.
        In Dijkstra's original algorithm, temporarily labeled nodes are treated as a nonordered list.  This is equivalent to treating the priority queue Q in the above general procedure for shortest path tree construction as a nonordered list.  This is of course a bottleneck operation because all nodes in Q have to be visited at each iteration in order to select the node with the minimum distance label.  A natural enhancement of the original Dijkstra algorithm is to maintain the labeled nodes in a data structure in such a way that the nodes are sorted by distance labels.  The bucket data structure is just one of those structures. Buckets are sets arranged in a sorted fashion (Figure 3). Bucket k stores all temporarily labeled nodes whose distance labels fall within a certain range.  Nodes contained in each bucket can be represented with a doubly linked list.  A doubly linked list only requires O(1) time to complete an operation in each distance update in the bucket data structure.  These operations include: 1) checking if a bucket is empty, 2) adding an element to a bucket, and 3) deleting an element from a bucket.
 

Figure 3  An example of the bucket data structur
Figure 3  An example of the bucket data structure (after Ahuja et al 1993, p.114).

        Dial (1969) was the first to implement the Dijkstra algorithm using buckets.  In Dial's implementation, bucket k contains all temporarily labeled nodes whose distance labels are equal to k.  Buckets numbered 0, 1, 2, 3, ..., are checked sequentially until the first nonempty bucket is identified.  Each node contained in the first nonempty bucket has the minimum distance label by definition.  One by one, these nodes with the minimum distance label become permanently labeled and are deleted from the bucket during the scanning process.  The position of a temporarily labeled node in the buckets is updated accordingly when the distance label of a node changes.  For example, when the distance label of a temporarily labeled node is changed from d(1) to d(2), this node is moved from bucket d(1) to bucket d(2).  This process is repeated until all nodes are permanently labeled.  Dial's original implementation of the Dijkstra algorithm (DKB) requires nC+1 buckets in the worst case, where C is the maximum arc length of a network.  However, it has been proven that for a network with a maximum arc length of C, only C+1 buckets are needed to maintain all temporarily labeled nodes (Ahuja et al. 1993, pp.113-114).
        It can be seen that the memory requirement in DKB can be prohibitively large when both C and n are large.  However, the memory requirement in DKB can be reduced using either the overflow bag implementation (DKM) or the approximate buckets implementation (DKA) as described by Cherkassky et al. (1993, p.7).  The overflow bag implementation maintains only a<(C+1) buckets where a is an input parameter.  Only temporarily labeled nodes whose distance labels fall within the range of [a(i), a(i)+a-1] are contained in the buckets at the i-th stage of the algorithm.  Other nodes are maintained in a separate set referred to as the overflow bag.  Initially, the values of i and a(i) are set to 0.  When there is no labeled node left in the given range, i is incremented by one and a(i) is set equal to the minimum of the distance label of the temporarily labeled nodes.  The nodes with distance labels within the new range of [a(i), a(i)+a-1] are moved into their corresponding buckets from the overflow bag, and another cycle of the scanning process begins.
        The Dijkstra's algorithm implemented with approximate buckets (DKA):  In the approximate bucket implementation of the Dijkstra algorithm (DKA), a bucket i contains those temporarily labeled nodes with distance labels within the range of [i*b, (i+1)* b-1], where b is a chosen constant.  Here approximate means that the values of the distance labels in a bucket are not exactly the same as in the case of DKB, but are within a certain range.  Nodes in each bucket are maintained in a FIFO queue.  Algorithm DKA requires a total of largerInteger(C/b)+1 buckets.  The worst case complexity of DKA is O(mb+n(b+C/b)).  It can be seen that this algorithm trades speed for space.  Each node can be scanned more than once, but a node cannot be scanned more than b times.
        The Dijkstra's algorithm implemented with double buckets (DKD): The double bucket implementation of the Dijkstra's algorithm (DKD) combines the ideas of the above two algorithms DKM and DKA.  Two levels of buckets, high-level and low-level, are maintained in the DKD implementation.  A total of d buckets in the low-level buckets are used.  A bucket i in the high-level buckets contains all nodes whose distance labels are within the range of [i*d, (i+1)* d-1].  In addition, a nonempty bucket with the smallest index L is also maintained in the high-level buckets.  A low-level bucket d(j)-L*d maintains nodes whose distance labels are within the range of [L*d, (L+1)* d-1].  Nodes in the low-level buckets are examined during the scanning process.  After all nodes in the low-level buckets are scanned, the value of L is increased.  When the value of L increases, nodes in the nonempty high-level buckets are moved to its corresponding low-level buckets, and the next cycle of scanning process begins.

6. Concluding Remarks

In recent years, we have witnessed an increasing popularity of transportation related decision analysis within a GIS environment (see, e.g., Ralston et al. 1994; Erkut 1996 and Noon et al. 1996). In this type of analysis, the computation of shortest paths is often a central task because shortest path distances are often needed as input for "higher level" models in many transportation analysis problems such as facility location, network flows, vehicle routing and product delivery, just to name a few. In addition, the shortest path problem usually captures the essential elements of more complicated transportation analysis problems. Hence, it can often be used as a benchmark or a starting point for solving more complicated problems in transportation analysis. With the advancement of GIS technology and the availability of high quality road network data, it is possible to conduct transportation analysis concerning large geographic regions within a GIS environment. Sometimes, this type of analysis has to be completed in real time. As a consequence, these analysis tasks demand high performance shortest path algorithms that run fastest on real road networks.
        Although there has been considerable reported research related to the evaluation of the performance of shortest path algorithms, there has been no clear answer as to which algorithm or a set of algorithms runs fastest on real road networks in the literature. A recent evaluation of shortest path algorithms using real road networks has identified a set of three algorithms that run fastest. These three algorithms are: 1) The Graph Growth Algorithms implemented with two queues (TQQ), 2) The Dijkstra's algorithm implemented with approximate buckets (DKA), and 3) The Dijkstra's algorithm implemented with double buckets (DKD). As a sequel to that earlier evaluation, this paper has reviewed and summarized the data structures and procedures related to the three algorithms. This paper provides a direct source that summarizes a set of shortest path algorithms that run fastest on real road networks. This source should be particularly useful for researchers and practitioners whose research and practice are related to the use of shortest path algorithms.

References

Ahuja, R. K., Magnanti, T. L., and Orlin, J. B.(1993) Network Flows: Theory, Algorithms and Applications.  Englewood Cliffs, NJ: Prentice Hall.

Ahuja, R. K., Mehlhorn, K., Orlin, J. B., and Tarjan, R. E. (1990)  Faster algorithms for the Shortest Path Problem.  Journal of Association of Computing Machinery, 37, 213-223.

Cherkassky, B. V., Goldberg, A. V., and Radzik, T. (1993)  Shortest Paths Algorithms:  Theory and Experimental Evaluation.  Technical Report 93-1480, Computer Science Department, Stanford University.

Dial, R. B. (1969)  Algorithm 360: Shortest Path Forest with Topological Ordering.  Communications of the ACM, 12, 632-633.

Dial, R. B., Glover, F., Karney, D., and Klingman, D. (1979)  A Computational Analysis of Alternative Algorithms and Labeling Techniques for Finding Shortest Path Trees.  Networks, 9, 215-248.

Dijkstra, E. W. (1959)  A Note on Two Problems in Connection with Graphs.  Numeriche Mathematik, 1:269-271.

Erkut, E. (1996)  The Road Not Taken.  ORMS Today, 23, 22-28.

Gallo, G., and Pallottino, S. (1988)  Shortest Paths Algorithms. Annals of Operations Research, 13, 3-79.

Glover, F., Klingman, D., and Philips, N. (1985)  A New Polynomially Bounded Shortest Paths Algorithm.  Operations Research, 33, 65-73.

Goldberg, A. V., and Radzik, T. (1993)  A Heuristic Improvement of the Bellman-Ford Algorithm.  Applied Mathematics Letter, 6, 3-6.

Hung, M. H., and Divoky, J. J. (1988)  A Computational Study of Efficient Shortest Path Algorithms.  Computers & Operations Research, 15, 567-576.

Mondou, J.-F., Crainic, T. G., and Nguyen, S. (1991)  Shortest Path Algorithms: A Computational Study with the C Programming Language. Computers & Operations Research, 18, 767-786.

Noon, C. E., Daly, M., and Zhan. F. B. (1996)  Beyond Assessment of Resources: Procurement of Woody Biomass Fuels.  Proceedings of Bioenergy'96 held in Nashville, Tennessee, pp.565-569.

Pallottino, S. (1984)  Shortest-Path Methods: Complexity, Interrelations and New Propositions.  Networks, 14, 257-267.

Pape, U. (1974)  Implementation and Efficiency of Moore Algorithms for the Shortest Root Problem.  Mathematical Programming, 7, 212-222.

Ralston, B., Tharakan, G., and Liu, C. (1994)  A Spatial Decision Support System for Transportation Policy Analysis.  Journal of Transport Geography, 2, 101-110

Sedgewick, R. (1990)  Algorithms in C.  Reading, MA: Addison Wesley Publishing Company.

Zhan, F. B. (1995)  Shortest Path Algorithms: Evaluation and Implementation in C.  Research Report.  Department of Geography and Planning, Southwest Texas State University, San Marcos, TX.

Zhan, F. B., and Noon, C. E. (1996)  Shortest Path Algorithms: An Evaluation Using Real Road Networks.  Transportation Science (in press).


ContentsJGIDA vol.1, no.1 Journal of Geographic Information and Decision AnalysisJGIDA Home