# Project #7: Solving the Traveling Salesman Problem

## Overview

In this project, you will choose and implement an algorithm to find good solutions to the traveling salesman problem (TSP) within a given time limit.

## Objectives

• Solve (or at least approximately solve) an NP-hard problem
• Implement an algorithm for finding good solutions to the TSP (Optimization variant)
• Further develop your ability to conduct empirical analysis
• Gain experience with time/space trade-offs
• Compete for fame and glory in the TSP hall of fame
• Mature as problem-solvers

## Definitions

### TSP

The Traveling Salesman Problem (TSP) optimization problem is defined as follows:

• Given: a directed graph and a cost associated with each edge.
• Return: the lowest cost complete simple tour of the graph.

A complete simple tour is a path through the graph that visits every vertex in the graph exactly once and ends at the starting point; such a tour is also known as a Hamiltonian cycle or Rudrata cycle. The TSP problem is an optimization problem, since we are searching for a simple tour with minimum cost.

### Completeness and Optimality

Whatever your design choices, it is important that you consider and understand several properties of your chosen algorithm: completeness (i.e., is your search guaranteed to find a solution when one exists?) and optimality (i.e., given sufficient time, does your algorithm find the optimal solution?).

If you choose a branch and bound state-space search, then your bounding function should be a lower bound in order to guarantee the optimality of your algorithm. If your B&B search is optimal, then it is also correct.

## Teams

You may optionally team up with another class member to complete this project. At most, teams of two members are permitted. You may also complete the project alone, if you wish, but teaming up is highly encouraged. Be sure that if you team up with someone that you are both committed to sharing the load.

Complete the Whiteboard Experience with your team-mate. (Thus, teams must be formed before the whiteboard experience.)

## Provided Framework

We provide a Graphical User Interface that generates a specified number of random points (the cities) on a 2-D GDI canvas. The problems are generated randomly when you click the “new problem” button, and you can control the problem size using the “Problem Size” field, as shown in the following figure:

Clicking the “new problem” button again will generate another new problem.

Setting the “Seed” field allows you to reset the random number generator and generate a new problem from a known state of the generator. This allows you to create a problem in common with other students for the purpose of evaluation. In the following figure, the seed 17 has been specified. The GUI provides a hook (the “Run” button) which calls the method that you are going to implement, namely ProblemAndSolver.solveProblem(). The current implementation provides a random tour. The following figure also illustrates that the “Run” button has been clicked, and a random tour among the cities (vertices) has been chosen:

Also worth noting is that the “Problem” number has been incremented to “1” to indicate that this is the first problem generated after setting the seed to 17. Subsequent “new problems” will be numbered incrementally from 1. This is an important detail for generating the correct problem for your report.

A word of clarification: the GUI currently displays a message in red text as follows: “n© means this node is the nth node in the current solution and incurs cost c to travel to the next node.” The “n” indicates the node's position in the displayed tour, and the “©” is the weight on the next edge in the tour. We elected to show the edge weight next to the vertex at the head-end of the edge so that it is clear with which the weight is associated. Distances are computed using the Euclidean distance.

Furthermore, the word “node” is ill-chosen, and our GUI should use the term “vertex” instead (TODO).

Be sure that you are using the latest .NET framework: Bring up the properties of your project (can use the key-chord Alt+F7). For the property “Target framework:”, specify .NET Framework 4 (or higher).

## Required: Whiteboard Experience (TM)

You are developing as a problem solver. You are also learning to solve a problem before you start writing code. In this project, we require you to step up to a whiteboard after reading these instructions before writing any code. Use the whiteboard to walk through the solution with your team-mate. If operating solo, you could present to a friend, who could be another CS major, a CS 312 classmate, or another technical person with an eye for detail.

State clearly your choice of algorithm. Make sure you sketch out and understand your chosen algorithm. If it is a solution-space algorithm, understand your algorithm. In particular, for local search, understand the neighborhoods produced by your operators. If it is a branch and bound algorithm, then choose and understand your state space, your bound function, your agenda, and every other ingredient necessary to do branch and bound on the TSP (see Appendix 1). Simulate a few steps of your algorithm on the board. Think out loud and with sketches about the contents of your data structures for simple examples.

In short, make sure you understand what you are going to do before you write a stitch of code. As you will note below, including a photo of your “whiteboard experience” is a required element of your project report.

Have a “whiteboard experience” and document it with a photograph of you next to the whiteboard sketches, etc.

Prepare a whiteboard experience report as a .pdf file. It should consist of one page with the following elements:

• [5 points] a caption (2-3 sentences) explaining why you are ready to proceed to write code
• the names of the people in your audience

Submit the whiteboard experience report online by the whiteboard experience deadline for this project.

## To Do

1. Have a “whiteboard experience” as directed above, and submit it online by the deadline for the whiteboard experience.
2. Design and write your main algorithm for solving the TSP: find the shortest complete simple tour through the City objects in the array Cities in the given time limit.
• The time limit is 60 seconds. Your algorithm must compute as good a solution as possible within the time limit.
• Possible algorithms: Branch and Bound, Local Search (e.g., with the 2-change operator), Local Search with random restarts, Deterministic Annealing, Simulated Annealing, Genetic Algorithms.
• If you choose to work with Branch and Bound,
• Use the final pseudo-code in lecture #35 (either the eager pruning or the lazy pruning version). If you are tempted to change the control flow of the pseudo-code, then you may be breaking branch and bound and disqualifying your algorithm. Check with instructor if you have questions.
• Select an agenda that will encourage probes and exploration. You do not want breadth-first-type behavior. For that reason, a queue is not permitted as an agenda on the project. If you choose a simple priority queue, then you cannot use the bound function alone as your priority but must come up with a priority function that will encourage probes. Breadth-first-type behavior rarely (if ever) updates the BSSF for even moderate numbers of cities.
• If you have another idea for an algorithm, then you may ask the instructor for permission to try it.
3. Implement your solver in the following method: ProblemAndSolver.solveProblem().
• Document your algorithm with legible comments in important places in the code that make it obvious to the reader what your solution is doing. This documentation also provides evidence of your comprehension. With the aid of adequate documentation, the correctness of your approach should be easy for the reader (e.g., TA) to verify. If the TA cannot easily determine the code's correctness, then fewer points will be awarded.
4. Your solver must include a time-out mechanism so that it will terminate and report the best solution so far after 60 seconds of execution time.
• Run a timer and check for time-out synchronously on every iteration through your inner-most loop. That is sufficient but potentially imprecise and will probably yield small delays (you may slightly exceed 60 seconds). Avoid doing work past the time-out.
• You could figure out how to use a timer to send an asynchronous event and interrupt your search at precisely 60 seconds.
• Start the timer as soon as you start to run your solution. If you are using B&B, then your solution includes the code for computing the initial BSSF. Therefore, you care how much time your initial BSSF requires, because the clock is ticking.
5. To display your solution, assign your best solution to a TSPSolution that contains the path you have discovered. Then call the Program.MainForm.Invalidate() method to force the display to refresh.
6. Set Program.MainForm.tbCostOfTour.Text to the cost of the tour you have discovered. Set Program.MainForm.tbElapsedTime.Text to the time that it took you to discover the solution.
• Remember that your program will run more quickly if you:
• Build and run the “release” version (rather than debug)
7. Check whether your algorithm is correct and optimal using the following two examples:
• Problem #1 of size 10, seed 1 has optimal tour cost 2524.
• Problem #1 of size 15, seed 1 has optimal tour cost 3577.
• If your algorithm is not guaranteed to be optimal, you may not find these results.
8. Prepare to report the space efficiency of your algorithm: implement a mechanism to report the maximum size (high water mark) of your agenda (in the case of B&B), your population (in the case of genetic algorithms), or any other “holding tank” required by your algorithm for each problem instance. Collect data for the report.
9. Collect your results as required in the Report section below.
10. Be sure to verify that your reported solutions are indeed valid tours (simple cycles visiting all cities).
11. Implement a baseline algorithm for comparison with your chosen search algorithm.
• A suitable baseline may be a greedy algorithm, a random algorithm, or some other similarly simple approach.
12. After submitting your project report, submit your results to the two surveys:
• Respond to the “Competition” survey form with your results.
• Also respond to the “One-Time Competition” survey form with the results on that set. (That's a total of two surveys!)

## Report

Write a type-written report containing the following distinct sections as a single PDF document, and submit it by following the submission directions in the course syllabus.

1. Place your name and section number at the top of the first page.
2. If you had a team-mate, list the name of your team-mate at the top of the first page also. You may submit a single copy of the report, and each team-member will receive the same grade.
3. Also include an estimate of the amount of time you spent on this project at the top of the first page.
4. Organize your report cleanly in sections as follows:
5. [40 points] Describe your algorithm. In particular, explain all of your design choices. Be systematic and clear. Take as much space as you need to adequately explain the approach and your design choices. This is the most important part of the project.
• If you implemented a Branch and Bound algorithm, then be sure the explain the following:
• Definition of a state
• Initial Best Solution So Far (BSSF)
• Bounding function: if you re-used the reduced cost matrix method, say so. If you came up with a tighter bounding function, describe your improvements and any efficiency issues.
• State expansion strategy: e.g., Does your B&B use the include/exclude edge approach to growing the state space, or does it expand all children by including edges? If you used a feasibility test (a.k.a. “partial criterion test”) when generating children states, please explain it.
• Agenda and priority function if applicable: describe your agenda data structures and how they work. If using a priority queue, describe your priority function. See the discussion above in the To Do section regarding restrictions on which agenda you can and cannot use.
• If you used someone else’s agenda implementation, then you should describe their data structure, per the third-party code policy.
• Solution test (a.k.a. “criterion test”)
6. [10 points] Describe your baseline algorithm.
7. Include three tables containing your experimental results. Format the tables as shown below. Your table must include results for the problems shown in the tables below (one row per problem). Points are allocated as follows:
• [20 points] Competition set:
• 2 small (10 cities), pre-specified problems
• 3 large (50 cities), pre-specified problems
• 2 small (10-24 cities) problems of your choice
• 3 large (25+ cities) problems of your choice
• [10 points] Baseline results
• Copy the table again and include the results from your baseline algorithm.
• Write a short paragraph comparing and contrasting the performance of your main algorithm with your baseline algorithm.
• [10 points] One-Time Competition set:
• 3 large (40 cities), pre-specified problems
9. You will only receive points for results on the pre-specified problems if you submit them online (via the surveys).
• Document your algorithm with legible comments in important places in the code that make it obvious to the reader what your solution is doing. This documentation also provides evidence of your comprehension. With the aid of adequate documentation, the correctness of your approach should be easy for the reader (e.g., TA) to verify. If the TA cannot easily determine the code's correctness, then fewer points will be awarded.

### Competition Table

(Note: you will include two copies of this table, one for your main algorithm and one for your baseline algorithm.)

Problem SpecificationPerformance
# Cities Seed Problem # Running time (sec.) Cost of best tour found Optimal? Max. space usage
10 17 1
10 17 2
50 321 1
50 321 2
50 321 3

### One-Time Competition Table

Problem SpecificationPerformance
# Cities Seed Problem # Running time (sec.) Cost of best tour found Optimal? Max. space usage
40 733 1
40 733 2
40 733 3

## Competition and One-Time Competition (a.k.a. "The Bake-Off")

### Submitting results

To facilitate collecting results for the hall-of-fame, it is required that each student report his or her Competition results on the five pre-specified problems via a survey, the URL for which will be sent to the Google Group. It is also required to submit results for the three pre-specified one-time problems to the One-Time Competition survey. A separate link will be sent for this second survey. The links are not posted here in order to avoid abuse by folks not in the class. It is possible to reply to the Competition survey more than once (in case a student wishes to submit improved results), but only one submission is permitted to the One-Time Competition survey. Submitting invalid results or results that are not your own (i.e., computed by your code in the assigned semester) is, of course, a breach of the honor code.

### Eligibility

Students are eligible to win the overall competition if all results are submitted on time (by the project deadline on the course schedule).

### Victory

The winner of the competition will be determined from among all eligible participants across all sections of the course in a given semester. A participant's score is based on the total score of the three larger Competition pre-specified problems (specified in the table above) plus the total score of the three larger One Time Competition pre-specified problems. Winning scores will be validated by our blue ribbon panel of TAs. A hall-of-fame will be posted in class on the last day of instruction for the semester, and bragging rights will be given to the winners to help everyone understand their methods.

### Rewards

Extra credit will be awarded to top 3 winners of the overall competition. Ties are permitted.

• 30% toward the project score for first-place.
• 20% toward the project score for second-place.
• 10% toward the project score for third-place.

### FAQ

I have been asked a few questions about the contest, that I want to address for everyone’s sake:

• Q. If I have a random component in my algorithm (e.g., initial BSSF), can I run my algorithm multiple times to get a better answer? (Relevant for large problems for which one minute is insufficient to run the algorithm to completion before time-out.)
• A. Yes. Record the seed you use for the random number generator for your best generated result, so that you can reproduce your result for winner verification.
• Q. Running multiple times to get a better answer on the pre-specified report problems seems like cheating. I’m improving my algorithm in between runs to get better answers, but the randomizers are not. What are you going to do about it?
• A. I am providing three large problems called the “One-Time Competition” set. Each contestant must submit results on the test set in addition to the problems from the project report. A student is permitted to run his or her algorithm on the One-Time Competition set only one time! No improvements are permitted on the test set. Students are on their honor to run once and only once.
• Q. Are the surveys required, even though I do not feel I have a competitive algorithm?
• A. Yes. Both the original survey and the test set surveys are required as part of completion of the project. We are interested in your results.
• Q. Can we know the winning distances for previous semesters while working on our projects.
• A. No. Sorry. I suspect you will do better if you don't give up where others have stopped.
• Q. Does being fast give me an advantage in the competition?
• A. Yes, in so far as it gives you the opportunity to keep looking for better answers. A lower time does not break ties.
• Q. If I am submitting my project report late, can I still submit the surveys?
• A. Yes. Every student who completes the project (whether on time or late) must submit the surveys. The surveys will close after the last day of instruction for the term / semester.

## Appendix 1: Branch and Bound Design Options

We cover branch and bound (B&B), as well as B&B design options for the TSP, in great detail in lectures. See the class schedule for links to the lectures and some short lecture notes on B&B (in the reading column) for guidance.

In a nutshell, a branch and bound algorithm must include the following ingredients:

• An initial Best Solution So Far (BSSF)
• A bounding function (for this assignment, must be a true lower bound)
• The definition of a state
• A state expansion strategy
• An agenda (which also determines search order)
• A solution test

Optional (depending on your state expansion strategy):

• A feasibility test

In designing your algorithm, you have many options for those ingredients. In the lectures we looked at a number of possibilities for each design ingredient as follows:

• A Best Solution So Far (BSSF):
• Complete graph: a random tour, a greedy tour
• Non-complete graphs: DFS
• A bound function
• Computed using a reduced cost matrix
• Other legitimate options for lower bounds were also discussed in class
• Choose a bound function that is tighter than 0.
• The definition of a state
• Reduced cost matrix + bound + partial path (fragments)
• A state expansion strategy
• All next cities
• Include-exclude
• An agenda (which also determines search order)
• queue
• stack
• priority queue (Be sure to choose a priority function other than the bound function alone; otherwise, B&B may never update the BSSF until it finds the optimal solution, which may not happen before time runs out.)
• hybrid

In the appendices to this instructions document, we review the bounding function involving a reduced cost matrix. That bounding function is not the only possible bounding function, and it is not mandatory that you use that function, but it is a good one. We also review one promising approach to state expansion, namely the “include-exclude” approach. The “include-exclude” approach is not the only way to generate states in the state space, and it is not mandatory that you use this approach. As a further resource in designing your B&B algorithm, you should refer to the Branch and Bound notes: http://faculty.cs.byu.edu/~ringger/CS312/readings/bandb-notes.pdf

### Appendix 1.A: A Possible Bounding Function and State Representation

Suppose we are given the following instance of the traveling salesman problem for four cities in which the symbol “i” represents infinity.

i 5 4 3
3 i 8 2
5 3 i 9
6 4 3 i 

One important element of a branch and bound solution to the problem is to define a bounding function. One possible bounding function requires that we find a reduced cost matrix. The reduced cost matrix gives the additional cost of including an edge in the tour relative to a lower bound. The lower bound is computed by taking the sum of the cheapest way to leave each city plus any additional cost to enter each city. This bounding function is a lower bound because any tour must leave and enter each city exactly once, but choosing such edges may not define a solution, as we saw in class.

Note that this is not the only possible bound function. Other relaxed versions of the TSP problem are also possible.

First, let's reduce row 1. The smallest entry in row 1 is the cheapest way to leave city A. A row is reduced by taking the smallest entry in the row, 3 in this case, and subtracting it from every other entry in the row. The smallest entry (3) is also added to the lower bound. After reducing row 1, we have a bound of 3 and the following matrix:

i 2 1 0
3 i 8 2
5 3 i 9
6 4 3 i 

Next, we reduce row 2 by taking the smallest entry in row 2, 2 in this case, and subtracting 2 from each entry in row 2. We add 2 to the bound and obtain the following matrix:

i 2 1 0
1 i 6 0
5 3 i 9
6 4 3 i 

The remaining two rows are reduced in similar fashion. Lowest value 3 is subtracted from row 3, and 3 is likewise subtracted from row 4. The final bound is 3 + 2 + 3 + 3 = 11, and the reduced matrix so far is:

i 2 1 0
1 i 6 0
2 0 i 6
3 1 0 i 

Reducing the rows only accounts for the cheapest way to leave every city. Reducing the columns includes the cheapest way to enter every city. Column reduction is similar to row reduction. A column is reduced by finding the smallest entry in a column of the reduced cost matrix, subtracting that entry from every other entry in the column and adding the entry to the bound.

The smallest entry in the first column is 1 so we subtract 1 from each entry in column 1 and add 1 to the bound. The new bound is 11 + 1 = 12 and the new matrix is:

i 2 1 0
0 i 6 0
1 0 i 6
2 1 0 i 

The remaining columns are already reduced, since they already contain a 0.

### Appendix 1.B: A Possible Approach to State Expansion: Include-Exclude

Another important element of a branch and bound solution is to define the manner in which children (or “successor”) states are expanded from a given state in the state space. In the “include-exclude” approach, we generate two children for every parent: the left child represents the inclusion of an edge in the tour and the right child represents the exclusion of that same edge. The next step is to decide which edge to include or exclude. We'll assume that we want to

1. minimize the bound on the left (include) child
2. maximize the bound on the right (exclude) child.

Choosing an edge so as to maximize a bound on the right child can lead to more aggressive pruning; consequently, this can be a compelling approach to finding a solution.

We observe that it is advisable to avoid including edges that are non-zero in the reduced matrix. If a non-zero edge in the reduced matrix is included, then the extra cost of that edge (as contained in the reduced matrix) must be added to the bound on the left side. However, we are trying to minimize the bound on the left side.

Next, get the bounds of including or excluding an edge. In the reduced matrix above, there are 5 entries that contain 0. We'll compute a pair of bounds (one for include and one for exclude) for each 0-residual-cost edge and pick the one that has the maximum right child bound and the minimum left child bound.

Start with the 0 at entry (2,1). If the edge from city 2 to 1 is included in the solution, then the rest of row 2 and column 1 can be deleted since we will leave city 2 once and enter city 1 once. We get this matrix:

i 2 1 0
i i i i
i 0 i 6
i 1 0 i

This matrix must be reduced. The cost incurred during the reduction is added to the bound on the left child. In this case, no rows or columns need to be reduced. So the bound on the left child is 12 (which is the bound on the parent).

Now for the right child. If the edge between 2 and 1 is excluded, then we just replace entry (2,1) in the matrix with an infinity. We now have:

i 2 1 0
i i 6 0
1 0 i 6
2 1 0 i 

This matrix must be reduced and the bound increased by the amount reduced. Only column 1 must be reduced, and it is reduced by subtracting 1 from each entry. The bound on the right child is then 12 + 1 = 13.

Stepping back for a minute, we have now determined that the bound on including edge (2,1) is 12 and the bound on excluding edge (2,1) is 13. Can we do better using a different edge? We'll answer that question by examining all of the other 0s in the matrix.

The easy way to examine the 0s is the following. To include an edge at row i column j, look at all of the 0s in row i. If column x of row i contains a 0, look at all of the entries in column x. If the 0 in row i of column x is the only 0 in column x, then replacing row i with infinities will force a reduction in column x. So add the smallest entry in column x to the bound on including the edge at row i and column j. Perform a similar analysis for the zeros in column j. This is the bound on including an edge.

To examine the bound on excluding an edge at row i and column j, add the smallest entries in row i and in column j.

A complete examination of the 0 entries in the matrix reveals that the 0s at entries (3,2) and (4,3) give the greatest right-bound, 12+2, with the least left-bound, 12. You should verify this on your own.

So we'll split on either (3,2) or (4,3); it doesn't matter which.

After deciding which edge to split on, the next step is to do the split. Doing the split generates two new reduced matrices and bounds. These are then added to the agenda.

Following the branch-and-bound algorithm from the lectures, the next step is to extract the next node from the agenda and repeat the process. The algorithm continues until the agenda is empty or time is exhausted. If you have a state in which $n$ (the number of cities in the TSP instance you're solving) edges have been included, then you have a solution. When a solution is found, check to see if that solution improves the previous best solution (so far). If so, the new solution is the best solution so far. If the new solution is now the best solution so far, then the agenda may be pruned to avoid keeping unpromising states around.

Another important aspect of the algorithm is preventing early cycles and keeping track of the best solution so far. They are related. As you add edges to a partial solution (state), you'll need to keep track of which edges are part of the solution. Since a simple tour of the graph can't visit the same city twice, you'll need to delete edges from the state’s residual cost matrix that might result in a city being visited twice. To do this, you'll need to know which cities have been included. The cities included in a partial solution will need to be stored (along with the matrix and bound) in each state on the agenda.

### Appendix 1.C: Future Exploration

Here are a couple of suggestions for further exploration with Branch and Bound. They are listed without prejudice against things that are or are not on this list:

• Think of a better way to visualize the step by step performance of the algorithm. Perhaps show how the reduced cost matrix evolves during the search process. The visualization should do something for each state pulled from the agenda.
• Implement a better ad hoc solution to compute your initial BSSF.
• Implement a tighter bound.
• Implement a better (and more expensive?) feasibility test to prune the space earlier.
• Implement a different search strategy (try the strategy that you didn’t try in the main project: include/exclude edge, all next edges, …)
• Implement a randomized version of your algorithm.

## Appendix 2: Implementation Notes

You'll probably need a constant to represent an unreachable vertex or unusable edge. Your first inclination may be to choose -1 for this constant as a stand-in for Infinity. However, when you go to use that value or compare against it, you may be thinking in terms of Infinity even though it is not. A helpful alternative is a field from the Double structure called PositiveInfinity.