Online compendium to the article: Algorithm Selection for the Graph Coloring Problem

Nysret Musliu, Martin Schwengerer

Learning and Intelligent OptimizatioN Conference (LION 7), Catania - Italy, Jan 7-11, 2013. Lecture Notes in Computer Science, to appear.

Abstract

We present an automated algorithm selection method based on machine learning for the graph coloring problem (GCP). For this purpose, we identify 78 features for this problem and evaluate the performance of six tate-of-the-art (meta)heuristics for the GCP. We use the obtained data to train several classiffication algorithms that are applied to predict on a new instance the algorithm with the highest expected performance. To achieve better performance for the machine learning algorithms, we investigate the impact of parameters, and evaluate different data discretization and feature selection methods. Finally, we evaluate our approach, which exploits the existing GCP techniques and the automated algorithm selection, and compare it with existing heuristic algorithms. Experimental results show that the GCP solver based on machine learning outperforms previous methods on benchmark instances.

Download article [bib]

 

Instances

Instances Source Description
chi500 http://www.imada.sdu.dk/~marco/gcp-study/ 520 instances with 500 vertices and different density (0.1, 0.5, 0.9), used in [1]
chi1000 http://www.imada.sdu.dk/~marco/gcp-study/ 740 instances with 1000 vertices and different density (0.1, 0.5, 0.9), used in [1]
dimacs http://mat.gsia.cmu.edu/COLOR04/ and http://mat.gsia.cmu.edu/COLOR04/clq.html Instances of the coloring and the clique part of the DIMACS challenge.
testset download (.zip, 163 MB) 180 graphs of different size (500,750,1000 and 1250 nodes) and density (0.1, 0.5, 0.9)

 

Experimental Results

Heuristic Performance: Results of the evaluation on the instances. The result for an heuristic is the lowest number of colors for that a feasible coloring has been found in more than 50% of the runs. In addition, we add the best and worst result of the two greedy methods (RLF and DSATUR) and the best known solution (BKS) found so far.

Attributes: Basic attributes of the instances. Note that some values (e.g. regarding the greedy methods or local search probing) are extracted using random components.

Instances Heuristic Performance Attributes
chi500 .csv, 53 kB .csv, 292 kB
chi1000 .csv, 80 kB .csv, 417 kB
dimacs .csv, 11 kB .csv, 99 kB
mixed (chi500, chi1000, dimacs) .csv, 140 kB .csv, 805 kB
testset .csv, 26 kB .csv, 211 kB

 

Hardness Classification

Depending on the results of our evaluation, we classify each instance into one of the following categories:

Instances
Classification
chi500 trivial trivial2 easy hard
chi1000 trivial trivial2 easy hard
dimacs trivial trivial2 easy hard
testset trivial trivial2 easy hard

 

ARFF-Files

ARFF-files contain the attributes and the best heuristic for each instance. Note that for the algorithm selection, we focus on only on hard instances.

Data Set
Basic Attributes
Best Configuration
Training Data (hard instances) .arff, 496 kB .arff, 2 MB
Test Data (hard instances) .arff, 93 kB .arff, 362 kB

 


[1] Marco Chiarandini   Thomas Stützle, An Analysis of Heuristics for Vertex Colouring, In Ed. P. Festa, Proceedings of the 9th International Symposium on Experimental Algorithms 2010,
Lecture Notes in Computer Science, Springer

 

For more informations, contact me under: mschweng at kr.tuwien.ac.at