Click here to Skip to main content
15,860,859 members
Articles / Programming Languages / C# 4.0

Shortest/Fastest Path For Large Amount of Nodes - Part 1 of 3 - Djikstra without Arrays

Rate me:
Please Sign up or sign in to vote.
4.79/5 (26 votes)
4 May 2010CPOL9 min read 44K   1.3K   73   16
Lowering the memory consumption to allow Djikstra's algorithm to handle large amount of nodes.

Introduction

The main objective of this article is to propose a way to use Dijkstra's algorithm without the memory limitations imposed by a huge static allocated bi-dimensional array, allowing it to handle a large set of nodes.

As "large set" I mean something from 100 to 1000 nodes. With lesser than 100 nodes, you can use any solution that uses arrays found all around the web.

The code we propose is not limited to 1000 nodes (actually I've not yet found a upper limit), but after 1000 nodes the time to find a solution is not acceptable.

The second part of this article will propose a solution that will be able to handle millions of node with a very low processing. 

Background  

Djikstra is a very generic greedy algorithm and it's solution includes the worst case scenario: A graph where every node can be connected to all other nodes. 

But after studding many problems, I found that most of them are based on "2D map like" sets of nodes.  In this scenario the connectivity between the nodes are very low. Nodes are normally connected only to nodes immediately around it creating a very sparse link matrix.

The basic Djikstra's algorithm that exists in the web does not take that in consideration and allocates resources to map all possible connections between nodes, even to itself.

Let's analyze this sample grid:

Sample Grid

It represents a 15x12 "2D map like" grid. This grid is the one used in the sample source code available to download above.

Now... if we consider it as a full connected matrix we would need 180 (15x12) nodes to represent it. That would lead to a link matrix with 32400 (1802) decimal (or integer) values to feed Djikstra. All the codes I've download from the web crashed ("memory overflow") when I tried to produce a link matrix with more than 100 nodes. 

If the program does not crash the allocated memory will make the code (and all the machine) slow.

The matrix size is always the square of it's number of nodes. That means that  the problem grows exponentially. So even if you get a very huge and powerful machine soon you will get to a point where you will not be able to even process it.

So... let's solve this problem first...

Preparing the Data   

First thing to do is to notice that not every point in the grid is used. There are voids. So instead of starting from the grid dimensions we will build a list of nodes. This reduces the problem a little bit: from 180 to (in this sample, of course) 120 nodes.

That didn't help much, but it's a start...

We also notice that the nodes are not completely connected and not every connection is bidirectional. We can reduce the problem to the links we really use. That means reducing the entire problem to only 246 links (again, in this sample). 

Now we are talking…

In my studies I found that the total number of links normally stays between twice to three times the number of nodes. Surprise! The problem became linear!

At this point we became aware that the main problem is not Djikstra’s algorithm itself but actually preparing the input data to use with it. But we still can improve the code to avoid too much memory consumption. We will talk about it later. 

With that in mind, let’s see an extract from the input file I used on the source code sample:

120
0	4
1	4
2	1
2	2
2	3
2	4
2	5
2	6
2	7
2	8
3	0
3	1
...
246
0	1	0.0625	4.5
1	0	0.0625	4.5
1	5	0.0625	4.5
2	11	0.0625	4.5
3	2	0.0625	4.5
4	3	0.0625	4.5
4	13	0.0625	3.0
5	1	0.0625	4.5
5	4	0.0625	4.5
5	14	0.0625	3.0
6	5	0.0625	4.5
7	6	0.0625	4.5
8	7	0.0625	4.5
8	17	0.0625	2.0
8	9	0.0625	4.5
9	18	0.0625	4.5
...
5
7	65
18	98
...

The first part is the list of nodes. It describes the point coordinate in the grid. It's only necessary if you want to plot the grid. Otherwise you will need only the second part. This one is a "table like" list that shows the first node, the second node and any amount of data that connects the two. In this sample, I’m giving a distance in miles and a time in minutes. So, the line... 

0	1	0.0625	4.5

... means that the node 0 is connected to the node 1 and their distance is 0.0625 miles and it takes 4.5 minutes to go from 0 to 1. 

The next line... 

1	0	0.0625	4.5

... says that there is also a connection from 1 to 0 making it bidirectional. The values are the same but it's not necessarily true. You could have different values if you are coming or going. 

The last part of the input just gives a list of cases we want to use to get some results over the code. So... The section that really matters in the input is the second one. That data can come from anywhere, a database, a GPS batch file, GoogleMaps.

Let's see the code...

The whole solution can be found in only two lines of the code:

C#
...
        _distanceSolver = new Dijkstra(_numberOfNodes, GetDistance); // line 74
...
            var solution = _distanceSolver.Solve(_cases[i].StartId, _cases[i].EndId); // line 83
...

The constructor of the Dijkstra class is very simple and just initializes the internal fields... 

C#
public Dijkstra(int numberOfNodes, Func<int, int, decimal> values)
{
    if (numberOfNodes < 0) throw new ArgumentException("The number of nodes must be positive.");
    if (values == null) throw new ArgumentNullException("values");

    NumberOfNodes = numberOfNodes;
    _values = values;
}

But it holds the main feature of this solution. By passing the values as a delegated function to the class, no static matrix is required at all. 

The values are fetched "on demand". And you have total control over the input and therefore over what is served to Djikstra. 

You just need to keep in mind that Djikstra is a cumulative algorithm so all valid values must be non negative. A value of -1 means that those nodes are not connected, but in our way of representing the nodes the better thing to do would be just take them out of the list. 

That control gives some important advantages: 

1 - All validation can be made inside the function. No exceptions need to be generated during the solution interactions, just return -1. 

2 - The values really do not need to be static. You can build functions that alter the output of the function depending of some extra information. 

For instance, if you want that the GetTime output changes along the the day (due traffic for exemple), maybe the GetTime function could have this signature: Func<int, int, DateTime, decimal>. 

Imagine that you would normally take 2 hours to go from one place to another, but in the middle of the way (after 1 hour)  the rush hour starts. The values for time you started with are not valid anymore and have to be adjusted. That 2 hour trip (in normal traffic) became a 2,5 hour trip (with part of it during rush hour). That's a more precise output. 

The possibilities are endless. 

The most important part of this code is not exposed. Is the Interaction function.

C#
private void Iteraction()
{
    var source = (from r in _nodes
                  where !r.Done
                     && r.Path.Any()
                  orderby r.Path.Sum(p => p.Value)
                  select r).First();

    var connectedNodes = (from r in _nodes
                          where r.Node != source.Node
                             && r.Node != _startNode
                             && _values(source.Node, r.Node) != -1
                             && (!r.Path.Any() || r.Path.Sum(v => v.Value) > (source.Path.Sum(p => p.Value) + _values(source.Node, r.Node)))
                          select new { node = r, newValue = _values(source.Node, r.Node) }).ToList();

    connectedNodes.ForEach(i => {
        i.node.SetPath(source.Path);
        i.node.AddStep(source.Node, i.newValue);
    });

    source.Done = true;
}

This is where Djikstra magic occurs and its logic was almost not changed, just improved with LINq and without any explicit static arrays. 

The only list that is always allocated during the life span of the solution is the internal _nodes list. That list has the maximum dimension of NumberOfNodes. So it's really tiny. 

Parallelism may be an alternative here but I will talk about that in the next part of this article.

One final thing I'd like to mention here is the return value of the Solve funtion.

C#
public IEnumerable<int> Solve(int startNode, int endNode)
{
    if (startNode < 0 || startNode >= NumberOfNodes) throw new ArgumentException("The start node must be between zero and the number of nodes");
    if (endNode < 0 || endNode >= NumberOfNodes) throw new ArgumentException("The end node must be between zero and the number of nodes");

    _startNode = startNode;
    Reset();
    for (var i = 1; i < NumberOfNodes; i++) Iteraction();
    GetNode(endNode).AddStep(endNode, 0);
    return GetNode(endNode).Path.Select(p => p.Node);
}

I've tried many outputs but I found out that the best one (for me) was the list of nodes holding the calculated path. 

Since I have the function(s) that gives me the values between two nodes, if I receive the chain of nodes that connects the start to the end, I can zip then and fetch the values for each step individually. An even apply a different function to compare results. 

That's what is done in the solution available to download. 

Well... Here is the complete code of the Dijkstra class:

C#
public class Dijkstra
{
    private int _startNode = 0;
    private List<destination> _nodes;
    private Func<int, int, decimal> _values;

    public int NumberOfNodes { get; private set; }

    public bool IsSolved { get { return !_nodes.Any(n => !n.Done); } }

    public Dijkstra(int numberOfNodes, Func<int, int, decimal> values)
    {
        if (numberOfNodes < 0) throw new ArgumentException("The number of nodes must be positive.");
        if (values == null) throw new ArgumentNullException("values");

        NumberOfNodes = numberOfNodes;
        _values = values;
    }

    public void Reset()
    {
        if (_nodes == null) _nodes = new List<destination>();
        _nodes.Clear();
        for (var node = 0; node < NumberOfNodes; node++)
        {
            _nodes.Add(new Destination(node, node == _startNode));
            var value = _values(_startNode, node);
            if (value != -1) GetNode(node).AddStep(_startNode, value);
        }
    }

    private Destination GetNode(int n)
    {
        return _nodes.SingleOrDefault(i => i.Node == n);
    }

    private void Iteraction()
    {
        var source = (from r in _nodes
                      where !r.Done
                         && r.Path.Any()
                      orderby r.Path.Sum(p => p.Value)
                      select r).First();

        var connectedNodes = (from r in _nodes
                              where r.Node != source.Node
                                 && r.Node != _startNode
                                 && _values(source.Node, r.Node) != -1
                                 && (!r.Path.Any() || r.Path.Sum(v => v.Value) > (source.Path.Sum(p => p.Value) + _values(source.Node, r.Node)))
                              select new { node = r, newValue = _values(source.Node, r.Node) }).ToList();

        connectedNodes.ForEach(i => {
            i.node.SetPath(source.Path);
            i.node.AddStep(source.Node, i.newValue);
        });

        source.Done = true;
    }

    public IEnumerable<int> Solve(int startNode, int endNode)
    {
        if (startNode < 0 || startNode >= NumberOfNodes) throw new ArgumentException("The start node must be between zero and the number of nodes");
        if (endNode < 0 || endNode >= NumberOfNodes) throw new ArgumentException("The end node must be between zero and the number of nodes");

        _startNode = startNode;
        Reset();
        for (var i = 1; i < NumberOfNodes; i++) Iteraction();
        GetNode(endNode).AddStep(endNode, 0);
        return GetNode(endNode).Path.Select(p => p.Node);
    }

    private class Destination
    {
        public Destination(int n, bool d)
        {
            Node = n;
            Path = new List<origin>();
            Done = d;
        }

        public int Node { get; private set; }
        public List<origin> Path { get; private set; }
        public bool Done { get; internal set; }

        public void AddStep(int d, decimal v)
        {
            Path.Add(new Origin(d, v));
        }

        public void SetPath(List<origin> p)
        {
            Path = new List<origin>(p);
        }
    }

    private class Origin
    {
        public Origin(int n, decimal v)
        {
            Node = n;
            Value = v;
        }

        public int Node { get; private set; }
        public decimal Value { get; internal set; }
    }
}

That code can handle a really large set of nodes. 

The Output

In the solution available to download it's shown how to use two different functions: One to fetch the shortest and the other to fetch the fastest path.  You will see they (almost always) are not the same. 

Before running the code in DEBUG mode, change your console output to 160 characters per line for a better view. It will show a representation of the link matrix. That is just to demonstrate how sparse it is. Each character is one connection between two nodes. It represents a 120x120 character matrix with a blank where there is no connection and a dot representing an existing connection. Notice how the connections go side by side the main diagonal. It means that all nodes almost always connect to other nearby node. Also the main diagonal itself has no dots meaning that there is no connection between the node and itself.   

That clearly shows how much memory is wasted if we allocate a full static bi-dimensional array.

If run it in the RELEASE mode the output will be send only to a file. That will show how fast this code can be.  

Here is a sample of the DEBUG mode output:

Link Matrix:
 .
.    .
           .
  .
   .         .
 .  .         .
     .
      .
       . .       .
                  .
           .
          . .         .
   .         .
    .         .         .
               .         .
      .         .
                 .         .
        .         .         .
                             .
                  . .
                   . .
                    .
                       .         .
            .         . .
             .         . .         .
                        . .         .
               .         . .
                          . .         .
                 .         . .         .
                            . .         .
                   .         . .
                              .
                                 .
                                .         .
                       .         .
                        .         .         .
                                   .
                          .         .
                                     .        .
                            .         .        .
                                       .        .
                              .         .
                                           .        .
                                  .         .
                                   .                  .
                                              .
                                               .        .
                                       .        .        .
                                                 .        .
                                         .        .
                                                            .
                                                    .
                                                   .          .
                                           .          .
                                            .        .          .
                                             .
                                                       .           .
                                               .        .           .
                                                         .           .
                                                 .        .
                                                           . .
                                                            .
                                                               .        .
                                                     .          .
                                                      .          .        .
                                                                  .        .
                                                       .           .
                                                                    .        .
                                                         .           .        .
                                                                      .        .
                                                           .
                                                                        .
                                                                       .         .
                                                               .        .
                                                                .        .         .
                                                                          .         .
                                                                  .        .
                                                                            .         .
                                                                    .        .         .
                                                                              .         .
                                                                      .        .
                                                                                  .        .
                                                                         .         .
                                                                          .         .        .
                                                                                     .        .
                                                                            .         .
                                                                                       .        .
                                                                              .         .        .
                                                                                         .        .
                                                                                .
                                                                                           .
                                                                                          . .       .
                                                                                  .        . .
                                                                                   .        . .       .
                                                                                             . .       .
                                                                                     .        . .
                                                                                               . .       .
                                                                                       .        . .       .
                                                                                                 . .       .
                                                                                         .        .
                                                                                                     .
                                                                                            .         .       .
                                                                                             .
                                                                                                      .         .
                                                                                               .       .
                                                                                                        . .
                                                                                                 .         .
                                                                                                            .
                                                                                                   .               .
                                                                                                              .
                                                                                                             . .
                                                                                                      .             .
                                                                                                                     .
                                                                                                        .         .
                                                                                                         .
                                                                                                            .          .
                                                                                                                     .
                                                                                                                      .
                                                                                                                 .
                                                                                                                   .

Case 1:
Shortest: 7->6->5->4->13->24->35->44->54->64->65: 0,625 miles, 29,5 minutes; Solution Time:0:00:00:00,0900000;
Fastest : 7->6->5->14->25->24->35->44->54->64->65: 0,625 miles, 28,0 minutes; Solution Time:0:00:00:00,0810000;
Case 2:
Shortest: 18->29->40->48->58->69->79->88->98: 0,500 miles, 25,5 minutes; Solution Time:0:00:00:00,0800000;
Fastest : 18->29->28->39->47->57->68->78->87->97->98: 0,625 miles, 21,0 minutes; Solution Time:0:00:00:00,0810000;
Case 3:
Shortest: 10->11->12->13->14->15->16->17->18->29->40->48->49->50->60->61: 0,938 miles, 52,5 minutes; Solution Time:0:00:00:00,0810000;
Fastest : 10->11->12->13->24->25->26->27->28->39->47->48->49->50->60->61: 0,938 miles, 44,0 minutes; Solution Time:0:00:00:00,0810000;
Case 4:
Shortest: 2->11->12->13->14->15->16->17->18->29->40->48->58->69->79->88->98->107->108->115->119: 1,250 miles, 67,5 minutes; Solution Time:0:00:00:00,0810000;
Fastest : 2->11->12->13->24->25->26->27->28->39->47->57->68->78->87->97->98->107->108->115->119: 1,250 miles, 53,0 minutes; Solution Time:0:00:00:00,0810000;
Case 5:
Shortest: 81->82->73->63->53->43->34->23->24->25->26->27->38: 0,750 miles, 32,0 minutes; Solution Time:0:00:00:00,0810000;
Fastest : 81->82->83->74->64->54->44->35->24->25->26->27->38: 0,750 miles, 27,0 minutes; Solution Time:0:00:00:00,0810000;

A Chalenge 

I haven't found yet the memory limitations of that code. The time limitation for a single fetch starts to appear if you go above 1000 nodes (not including the time used to load the data). 

Here is a table with the approximated times I obtained for a NxN 2D map with all connections set as bidirectional:  

# of Links # of IteractionsTotal Time (milliseconds)Average Iteraction Time (milliseconds)
360100800.8
8402256602.9
152040036409.1
24006251294020.7
34809003780042.0
476012249680079.1

So now is the processing time that is increasing exponentially.

If someone want to try to find memory limit please don't forget to post here your benchmark result also. 

What is Next? 

This article deals with a imaginary grid with only 120 nodes that gives us around 240 links. In the real world even a small neighborhood in a small city can have around 10000 nodes. Any simple game map or maze will have much more than that.  

And what happens when we need to go to a even greater area. For instance how to connect a place in New York with one in Los Angeles. Imagine the number of nodes. After a very raw and superficial calculation I’ve got that this number is in the order of 1010 (!!) nodes.

Not even the best machine could do that in a reasonable time with the code above. We know that Google does that. So... Where is the trick?

In the next article I will talk about one possible approach to solve that kind of problem. 

Well... That's all for now folks...  

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior) Pronto Solutions
Brazil Brazil
André Vianna is a Software Engineer in Rio de Janeiro, Brasil.
He is PMP, MCP and have worked as Development Project Manager, Senior Software Developer and Consultant for more than 15 years.
He worked with several major companies in Brasil, EUA, Canada and Mexico.
He has been working with the cut edge .NET platform since 2001 and is a C# specialist.

Comments and Discussions

 
QuestionThanks Pin
Hooman_Kh10-May-15 11:30
Hooman_Kh10-May-15 11:30 
QuestionParts 2 and 3? Pin
Amarnath S25-Apr-13 23:27
professionalAmarnath S25-Apr-13 23:27 
QuestionVery helpful! Pin
Member 278499112-Dec-11 2:07
Member 278499112-Dec-11 2:07 
GeneralChanging Article Level Pin
Andre Vianna4-May-10 13:33
Andre Vianna4-May-10 13:33 
NewsInformative Pin
kaveh0964-May-10 11:25
kaveh0964-May-10 11:25 
GeneralRe: Informative Pin
Andre Vianna4-May-10 12:42
Andre Vianna4-May-10 12:42 
GeneralMy vote of 2 Pin
wtwhite26-Apr-10 22:01
wtwhite26-Apr-10 22:01 
GeneralRe: My vote of 2 Pin
wtwhite26-Apr-10 22:02
wtwhite26-Apr-10 22:02 
GeneralRe: My vote of 2 Pin
Andre Vianna28-Apr-10 3:26
Andre Vianna28-Apr-10 3:26 
GeneralRe: My vote of 2 Pin
wtwhite30-Apr-10 16:08
wtwhite30-Apr-10 16:08 
GeneralRe: My vote of 2 Pin
Andre Vianna3-May-10 3:25
Andre Vianna3-May-10 3:25 
Hi again =)

I'm really enjoying our conversation. And I loved the sample you've sent. I will use it to show how I see the problem.

Let's consider that the distances between all nodes are contant equal 100 miles.

As you said the values represent the time in hours between the nodes. So to node A to B we are going in 100m/h, in nodes A to D and D to B at 50m/h. In node B to C during the first 4 hours we travel at 1m/h (maybe due a traffic accident) after that we go again at 100m/h.

We really never stoped moving.

So... If we come from A to B we do not stop at B and wait 3 hours. Actually we continue to C for 3 hours at 1m/h after 3 miles the traffic is liberated and we go the rest of the 97 miles at 100m/h... (0,97 hours or 58 minutes and 12 seconds). The total time is 4 hours 58 minutes as 12 seconds.

Now... if we come to B from D. We have spend 4 hours to get there and we spend another hour to go to C with a total of 5 hours.

So the fastest path is A -> B -> C with a final result of 4,97 (wich is less than 5 hours going by the other path). Remember that Time and Speed are a continuous functions so the code must be adapted the function to consider that and server the right values to the Djikstra engine.

Now. If you want to express that the subject will stop at the B node until the proposed time has passed it will be necessary to change the set of rules to:

C#
if t < 4 then 100 //1 hour from A to B + 99 hours waiting at B node
else 1               //after that you can go free.
    |
    |       1
A-------B-------C
 \     /
2 \   / 2
   \ /
    D


Running Djikstra with this new set of rules the result will be the path A -> D -> B -> C with a total of 5 hours, ignoring the other path.

Notice that if you say that the wait at the B will last only until the traffic is freed the problem should be expressed as this:

C#
if t < 4 then 4 //1 hour from A to B + 3 hours waiting at node B until the traffic is free again
else 1             //after that is no waiting time.
    |
    |       1
A-------B-------C
 \     /
2 \   / 2
   \ /
    D


In this case there is no difference between the paths. Both takes 5 hours. But any small change will give a different answer. Let's just change the time in the A -> D path, like this:

C#
   if t < 4 then 4 else 1 
       |
       |       1
   A-------B-------C
    \     /
   1 \   / 2
      \ /
       D
Result: A -> D -> B -> C

   if t < 4 then 4 else 1
       |
       |       1
   A-------B-------C
    \     /
   3 \   / 2
      \ /
       D
Result: A -> B -> C


Can you see that the point here is how the problem is interpreted or modeled?

That modeling work is part of the solution. And during that you might be able to find if Djikstra (and by consequence this code) applies or not.

Regards
André Vianna
GeneralRe: My vote of 2 Pin
wtwhite3-May-10 20:15
wtwhite3-May-10 20:15 
GeneralRegarding the "Advanced" category Pin
torial5-May-10 8:48
torial5-May-10 8:48 
GeneralDijkstra implementations Pin
Maniezzo26-Apr-10 21:10
professionalManiezzo26-Apr-10 21:10 
Generalgreat stuff! Pin
Herre Kuijpers20-Apr-10 10:59
Herre Kuijpers20-Apr-10 10:59 
GeneralRe: great stuff! Pin
Andre Vianna20-Apr-10 11:29
Andre Vianna20-Apr-10 11:29 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.