The Travelling Salesman (or Saleswoman) Problem

Poor Willy Loman

Imagine you're a sales rep who needs to visit several customers scattered across the country. You want to visit each city just once, in the shortest time or distance and return to your home base. This is a widely studied optimization problem that has many applications. It is "NP hard" meaning optimal solutions to real world (big) problems can be prohibitively expensive in terms of computer resources, As a result, much work has been done to develop algorithms that get near optimal solutions in a feasible amount of time.

I will cover several approaches to this problem using an example from [1] where a prospective college student wants to visit several college campuses across the US before deciding which to attend.

All of the Python scripts and supporting files used for this this post can be found here. The example forms a "complete graph" i.e. all colleges are connected to every other college. All of the Python scripts assume a complete graph.

The College Tour

Your applications have been accepted by 20 colleges across the US (Congratulations!). Mom is willing to tag along and help with the driving but she wants to keep the road-trip miles to a minimum.

Before we try to optimize our route we need to decide how we will measure the distance between the colleges. We could compute the Euclidean distance between each pair of locations given their x, y coordinates. This may under estimate distance since highways rarely take a direct path between points. A better method is to pre-determine the mileage between city pairs and enter this data into an adjacency matrix. Here is a portion of the adjacency matrix for our college tour example:

This is a square, symmetric matrix with element(i, j) equal to the distance from college i to j. This allows us to determine distance however we choose. For example, by using Google maps. It also relieves the algorithm of needless computation work for each version of the route it evaluates.

Naïve or Brute Force

For smaller problems we can use a brute force approach. Determine all possible route permutations and evaluate them to determine the shortest. This has the advantage that optimality is assured. Unfortunately, it has a complexity of \(O(n!)\) so our 20 college problem could have up to \(2.43\times 10^{18}\) cycles. I need a faster PC or a lot more patience to wait for this to solve the 20 college example. For a Python implementation of brute force see this script.

Nearest Neighbor

Perhaps the simplest and fastest algorithm for TSP is called Nearest Neighbor (NN). From the starting point or last city visited, the algorithm looks for the closest unvisited location. This repeats until all locations have been visited. The choice of starting point affects the result. Optimality is not guaranteed.

I found a Numpy based implementation on Stack Overflow that solves the 20 college problem in under half a milli-second. With this speed it is feasible to try all the colleges as starting points.

Evaluating all of the colleges as a starting point took only 3.1 msec. Starting at college 8 (Michigan) yielded the shortest path. From this starting point, the NN algorithm yields the following result:

10,163 miles. Maybe driving to all these colleges wasn't a great idea. We'll assume another pandemic is raging and air travel is off the table. The route doesn't seem too bad but maybe we can do better. Note: lines between colleges indicate connections, not the actual path taken while driving.

Geometric Algorithm

This algorithm is described in [1]. We start with a "convex hull"[2] connecting the outer most points of the problem:

Given

A = set of vertices that lie on the hull

C = edges that lie on hull

T = set of vertices not on hull

The algorithm then proceeds as follows:

Pick a vertex v from T (the interior set) closest to the convex hull and create all of the edges incident with v and the vertices in A (the hull set).
Measure all of the angles created this way. Let v, u and v, w be the edges that form the largest angle centered at v.
Add these edges to C, destroy the rest of the edges we created in step 1 and remove the edge u, w from C.
Add v to A
Select the next vertex in T and repeat steps 1-4 until T is empty.

Please see [1] for a much better explanation of how this works. To make the results comparable with other TSP algorithms, I used values from our adjacency matrix to compute route distances (i.e. non-geometric). In step 2 we use interior angles to choose the next vertex to add which implies straight edges connecting the vertices. In spite of this potential discrepancy, the algorithm works well for this example. The geometric solution for this problem is:

The run time for this on my PC is 4.7 msec. We're down to less than 9,500 miles and the route seems intuitively better than the NN solution.

Closest Insertion

Also reviewed in [1], this algorithm is similar to nearest neighbor but instead of only looking for the closest location from the end of the path, it also searches the locations near the beginning of the path. In other words, the route can grow forwards or backwards.

Like NN, closest insertion results are affected by the choice of starting point. For our college tour example, evaluating all the possible starting points took ~36. msec. I believe this code is slower than the NN code mainly because it does not take advantage of Numpy modules.

Starting at Louisville gave the shortest route; 9,721 miles. Not quite as good as the geometric solution but better than NN.

Christofides Algorithm

Another popular heuristic for finding approximate solutions to the TSP is Christofides algorithm. It has been mathematically proven to give solutions no worse that 50% higher than optimal. The Python implementation I used ran a little faster than the geometric algorithm (3.7 msec) but found a route that was 10,131 miles long. Almost as long as the NN solution.

Held-Karp Algorithm

The Held-Karp algorithm uses Dynamic Programming to solve the TSP problem. It finds the optimal solution in exponential time: \(O(2^{n}n^{2})\). Dynamic programming is too complex for me to summarize here so I'll show the results from a Held-Karp DP script that uses memoization to improve speed. The optimal solution to our example is 9,275 miles. On my PC this took 51.3 seconds. Much longer than the approximate algorithms above but still under a minute and it yields the optimal solution. I'll show a graphic of the optimal route in the Integer Program section below.

Genetic Algorithm

Evolutionary or "Genetic" algorithms are often used to get approximate solutions to TSP's. They can handle larger size problems but might not find an optimal solution. The basic steps for a GA are:

Create the population (randomly)
Determine fitness (determine distance for each route from 1.)
Select the mating pool (including elites and others based on probability)
Breed (gene crossover from parents)
Mutate (for diversity)
Repeat

See [3] for a gentle introduction on the use of genetic algorithms to solve the TSP. There are many ways to customize a GA; crossover method, mutation rates, etc. The choices and settings made in my script are the result of a lot of experimentation.

Here is a typical result. The GA script ran for 390 generations [4] and took 42.7 seconds. I found that if I ran the script multiple times, I could eventually get the optimal solution. This is due to the randomness used to create the populations for each generation and the parent selection process. A larger population size can improve the chance of finding an optimal solution at the cost of longer run times.

After running my GA script a few times, I got the optimal route of 9,275 miles in just 142 generations. In theory, If you allow the algorithm to run for enough generations it will achieve a near optimal result. I found that my implementation seemed to get "stuck" once it had a near-optimal solution like the 10,305 mile route shown above.

Integer Programming

I saved the best for last. "An integer programming problem is a mathematical optimization program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective function and the constraints (other than the integer constraints) are linear."[Wikipedia]

Using a Python MIP solver script, I am able to find the optimal route in just 3.2 seconds. While optimality is not guaranteed, ILP always finds the optimum route for our college tour example.

Other Methods

There are several other algorithms used to find solutions to the TSP. I may expand this post in the future to show how they stack up.

Summary

* GA runtime and iterations vary due to randomness in method. Multiple runs may be needed to achieve optimal result.

Distance and time ratios are used to show how each algorithm performed vs. the fastest (time) or shortest (distance) results. The comparisons are valid for this example/problem size. A smaller or larger problem may indicate a different algorithm choice.

GA, ILP and Held-Karp can all achieve optimal solutions but GA might take multiple runs to get there. Held-Karp assures optimality but takes close to a minute to solve this 20 college problem. ILP is much faster than GA and Held-Karp and always finds the optimal solution.

Closest Insertion, Geometric, Christofides and NN produced near optimal solutions in tiny fractions of the time required for the others. This would make them a good choice for large scale TSPs.

[1] "THE TRAVELING SALESMAN PROBLEM", Thesis by Corinne Brucato,

University of Pittsburgh, 2013

[2] The convex hull of a shape is the smallest convex set that contains it.

[3] Evolution of a salesman: A complete genetic algorithm tutorial for Python, Eric Stolz

[4] 240 generations to achieve the final route plus 150 generations to establish no further improvement was likely.

Search This Blog

Optimization with Python