Parallel simulated annealing algorithm for graph coloring

. The paper describes an application of Parallel Simulated An nealing (PSA) for solving one of the most studied NP-hard optimization problems: Graph Coloring Problem (GCP). Synchronous master-slave model with periodic solution update is being used. The paper contains description of the method, recommendations for optima! parameters set tings and summary of results obtained during algorithm's evaluation. A comparison of our novel approach to a PGA metaheuristic proposed in the literature is given. Finally, directions for further work in the subject are suggested.


Introduction
Let G = (V, E) be a given graph, where V is a set of JV/ = n vertices and E set of /E/ = m graph edges.Graph Coloring Problem (GCP) [1,2] is defined as a task of fin ding an assignment of k colors to vertices c : V ---+ { 1, ... , k} , k :::; n , such that there is no conflict of colors between adjacent vertices, i.e.V( u, v) E E : c( u) =Ic( v) and number of colors k used is minimal ( such k is called the graph chromatic number x( G)).
A number of GCP variants is used for testing printed circuits, frequency assignment in telecommunication, job scheduling and other combinatorial optimization tasks.The problem is knowa to be NP-hard [3].Intensive studies of the problem resulted in a large number of approximate and exact solving methods.GCP was the subject of Second DIMACS Challenge [4] held in 1993 and Computational Symposium on Graph Coloring and Generalizations organized in 2002.The graph instances [5] and reported research results are frequently used in development of new coloring algorithms and for reference purposes.
The purpose of the paper is to present a new algorithm capable of solving GCP, developed on the basis of Parallel Simulated Annealing (PSA) method [16].Recent years brought a rapid development of PSA techniques.Classical Simulated Annealing [17] was transformed into parallel processing environment in various ways: most popular approaches involve parallel moves, where single Markov chain is being evaluated by multiple processing units calculating possible moves from one state to another.The other method uses multiple threads for computing independent chains of solutions and exchanging the obtained results on a regular basis.Broad studies of both techniques can be found in [18,19].The PSA scheme for GCP, proposed by the authors, includes above strategies with the rate of current solution update used as a distinctive control parameter.
The paper is organized as follows .Next section is devoted to the description of the proposed PSA algorithm.Besides its gen erał structure, details of cooling schedule and cost function being used as well as neighborhood solution generation procedure are given.Subsequent part of the paper presents results of algorithm's experimental evaluation.Finał part of the contribution gives generał comments on the performance of PSA algorithm, and possible directions for future work in the subject.

Parallel Simulated Annealing
PSA algorithm for GCP introduced in this paper uses multiple processors working concurrently on individual chains and agreeing about current solutions at fixed iteration intervals.The aim of the routine is to minimize chosen cost function, with storing the best solution found .The coordination of the algorithm is performed in master-slave model -one of processing units is responsible for collecting solutions, choosing the current one and distributing it a mong slave units.
The exchange interval e; is a parameter which decides which PSA scheme is being used.Setting e; = 1 is equivalent to producing single chain of solutions using multiple moves strategy.Increasing the interval e; leads to creating semiindependent chains on all slave processors starting at each of concurrent rounds with the same established solution.Setting e; to infinity results in performing independent simulated annealing runs.The generał scheme of the algorithm is presented below: Detailed information about our Simulated Annealing algorithm like cooling scheme, representation of the solution, method for generation of neighborhood and cost calculation will be given in following subsections.

Cooling Schedule
Choosing proper temperature schedule is crucial for algorithm based on Simulated Annealing methodology since it influence the acceptance probability of positive transitions (i.e. when a cost difference L1cast,i between generated neighbor and initial solution is positive) given by Metropolis [20] : ( As a result of intensive studies in this area multiple cooling strategies were developed [21].The cooling schedule used here is the exponential one: where a is the cooling rate (usually set at 0.80-0.99level [22]) for each cooling step.Every SA step consists of M; iterations.For the exponential schedule following holds: M;+1 =(JM;. ( In order to extend gradually SA runs at !ower temperature levels constant /3 is chosen usually from the range (1.01, 1.20] . In addition to proper cooling schedule one has to choose correct initial temperature T0 .The authors used the most common method that involves calculating average cost difference Llcost,o from a set of pilot runs consisting of positive transitions from an initial state.Preliminary temperature assuring desired initial acceptance probability P(Llcost,o) can be calculated afterwards from equation: The alternative approach could follow a universal method for initial temperature selection introduced in (23].

Solution Representation and Neighborhood Generation
A graph coloring c is represented by a sequence of natura!numbers c =< c[l], ... ,c[n] >, c[i] E {l, ... ,k}, which is equivalent to set partition representation with exactly k non-empty blocks.
A rule for generation of a neighbor solution can be selected out of a wide range of existing methods [24].For the purpose of the presented algorithm the following form of restricted 1-exchange neighborhood is used :

Cost Assessment
As a quality measure of a selected coloring c the following cost function was used [13]: where q -is a penalty function: d -is a coefficient for solution with conflicts: O when :Z::::(u,v)EE q(u, v) = O (7) and kis the number of colors used.

Experimental Evaluation
For testing purposes an implementation of the algorithm based on Message Passing Interface was prepared.All experiments with simulated parallelism were carried out on Intel® Xeon TM machine.As test instances standard DIMACS graphs, obtained from [5], were used.For experiments following values of SA control parameters were chosen: a = 0.95 and (3 = 1.05.Initial temperature was determined from a pilot run consisting of 1 % ( relative to overall iteration number) positive transitions .The termination condition was either achieving the optima!solution or the required number of iterations.Due to space limitations only most representative results are presented in the paper.The full set of simulation data can be found on the first author's web site (http://www.pk .edu. pl;-szymonl).

SA Parameters Settings
At first, the optima] values of SA parameters were under investigation.Essential results of those experiments are gathered in Table Opening set of 500 runs with fina! temperature T 1 = O.I, iter _no= 10000, randomly generated initial solution with k = x( G) and the initial probability changing within the range 10%-90% proved that the best algorithm's performance, measured primarily by minimum average cost function (the second criterion was the iteration number), is achieved for high initial probabilities with an optimum found at about 60%-80%.It was observed, however, that the exact choice of P(L:.cast,o) in this range is not very significant for the overall algorithm's performance.Obtained results confirmed a hypothesis that for more complex problems it is advisable to use higher values of initial temperature.
In the next experiment the optima!finał temperature (relative to To) was under examination.For fixed P(L:.cast,o) = 70%, iter _no= 10000 and k = x(G) the best results were obtained for Tt E [O.Ol, 0.2] •To .Again, higher solution quality for more complex graph instances was achieved with increased temperature ratios.
The influence of initial number of colors ko on the solution quality was also determined experimentally.The range [x(G) -5, x(G) + 5] was under consideration with P(L:.cast,o) = 70%, Tt= 0.05-To and iter _no= 10000.It was observed that using initial color number slightly different than chromatic number do not affect significantly the algorithm's performance.For some graph instances it is even recommended to start with colorings with ko ]ower than x(G) .
In the end it should be noted that above presented statements are to be treated as overall guidelines for SA parameters settings obtained from a relatively small set of graphs.The exact values for those parameters depend largely on the considered class of graph instances.

Influence of Parallelization Schemes
The second stage of the computing experiments involved examination of algorithm's performance with different parallelization schemes and comparison with results obtained with sequential Simulated Annea:ling algorithm.For PSA the configurations with ei = {1, 2, 4, 6, 8, 10, =} and slaves number from 2 to 18, were tested with various graph instances.To examine the effect of parallelization on the processing time the same number of iterations i ter_ no = 100000 was set for both sequential SA and PSA algorithms (in PSA each slave performs only i ter_ no/ slaves _ no iterations).For the temperature sched ule following settings were app!ied: P(L:.cast,o) = 70% and Tt = 0.05 • To .
Obtained results include mean values of the cost function, the number of conflict-free/optimal solutions, the number of iterations needed to find an optima] coloring (if applicable), and algorithm's execution time t[s] (until best solution has been found).The summary of the results is presented in Table 2.
Best and worst parallel configurations, in terms of average f (c) and processing time, with the obtained results are reported.As a reference average performance of the algorithm is given as well.
PSA clearly outperforms the sequential Simulated Annealing in terms of computation time.Moreover, applying parallelization improves the quality of the obtained solution.It can be seen though that it is important to select a proper configuration of the PSA algorithm to achieve its high efficiency.For most problem instances it is advisable to use multiple moves strategy with optima], relatively small, number of slaves.There exists one exception to the presented statement -for the class of mulsol.igraphs significantly better results were obtained with fully independent SA runs.
The worst results of using Parallel Simulated Annealing were obtained when a high number of slaves was involved in the computations and parallelization scheme was far from the optima!one.

Comparison with Parallel Genetic Algorithm
The last stage of the testing procedure involved comparison of time efficiency of the PSA algorithm and Parallel Genetic Algorithm introduced in [12].The implementation of the PGA for GCP used in [13] was applied.Both algorithms were executed on the same machine for selected DIMACS graph instances and computation time needed to find optima!coloring was reported.PGA was executed with 3 islands, subpopulations consisting of 60 individuals, migration rate 5, migration size 5 with the best individuals being distributed, initial number of colors 4 and operators: CEX crossover (with 0.6 probability), First-Fit mutation (with 0.1 probability).For PSA 3 slaves were used, P('1cost,o) = 70%, T 1 = 0.05 • T0 , iter _no= 300000 (for instance mulsol.i.l to find optima!solution runs of length 1500000 iterations were needed).For most instances multiple moves strategy was applied.One exception was mulsol.iclass of graphs where, according to earlier observations, independent SA runs were executed.Results of the experiments, enclosed in Table 3, clearly demonstrate that PSA performance is comparable to the one achieved by the PGA.For some graph instances, like book graphs and miles500, the proposed algorithm was found to be superior.On the other hand, there exist a group of problems relatively easy to salve by PGA and, at the same time, difficult to salve by PSA (like mulsol.i.l) .!VI = 185, IEI = 3946 mycie17, x(G) = 8 0.34 0.25 mulsol.i.1, x(G) = 49 14.

Conclusion
In the paper a new Parallel Simulated Annealing algorithm for GCP was introduced and evaluated.First experiments revealed that its performance depends on choosing a cooling schedule and generation of the initial coloring suitable to the considered problem.Some generał guidelines were derived for the algorithm 's settings that ensure a better solution quality.F'urther research in the subject could concern adaptive cooling schedules and generation of initial solution by means of an approximate method.
Choosing an optima!number of processing units and parallelization scheme for the PSA was also under consideration.We found that problem specification essentially influence the proper choice of these elements.However, it can be stated as a generał remark that the highest efficiency of the master-slave PSA algorithm is achieved for optima!, relatively small number of slaves.
During the performance evaluation the PSA algorithm was proved to be an effective tool for solving Graph Coloring Problem.The experiments showed that • it achieves a similar performance level as PGA.The comparison results of both methods showed that none of them is superior.lt encourages efforts for development of a new hybrid metaheuristics which would benefit from advantages of both PGA and PSA approaches.The overall concept of a hybrid algorithm could implement the idea presented in [25] and include some other improvements of the standard PSA scheme as proposed in [26].

Table 1 .
1.Influence of Simulated Annealing parameters on algorithm's performance

Table 2 .
Experimental evaluation of the PSA algorithm for GCP

Table 3 .
Comparison of time efficiency of PGA and PSA algorithms applied for GCP