Online compendium to the article: Algorithm Selection for the Graph Coloring Problem

Nysret Musliu, Martin Schwengerer

Learning and Intelligent OptimizatioN Conference (LION 7), Catania - Italy, Jan 7-11, 2013. Lecture Notes in Computer Science, to appear.

Abstract

We present an automated algorithm selection method based on machine learning for the graph coloring problem (GCP). For this purpose, we identify 78 features for this problem and evaluate the performance of six tate-of-the-art (meta)heuristics for the GCP. We use the obtained data to train several classiffication algorithms that are applied to predict on a new instance the algorithm with the highest expected performance. To achieve better performance for the machine learning algorithms, we investigate the impact of parameters, and evaluate different data discretization and feature selection methods. Finally, we evaluate our approach, which exploits the existing GCP techniques and the automated algorithm selection, and compare it with existing heuristic algorithms. Experimental results show that the GCP solver based on machine learning outperforms previous methods on benchmark instances.

Download article [bib]

Instances

Instances	Source	Description
chi500	http://www.imada.sdu.dk/~marco/gcp-study/	520 instances with 500 vertices and different density (0.1, 0.5, 0.9), used in [1]
chi1000	http://www.imada.sdu.dk/~marco/gcp-study/	740 instances with 1000 vertices and different density (0.1, 0.5, 0.9), used in [1]
dimacs	http://mat.gsia.cmu.edu/COLOR04/ and http://mat.gsia.cmu.edu/COLOR04/clq.html	Instances of the coloring and the clique part of the DIMACS challenge.
testset	download (.zip, 163 MB)	180 graphs of different size (500,750,1000 and 1250 nodes) and density (0.1, 0.5, 0.9)

Experimental Results

Heuristic Performance: Results of the evaluation on the instances. The result for an heuristic is the lowest number of colors for that a feasible coloring has been found in more than 50% of the runs. In addition, we add the best and worst result of the two greedy methods (RLF and DSATUR) and the best known solution (BKS) found so far.

Attributes: Basic attributes of the instances. Note that some values (e.g. regarding the greedy methods or local search probing) are extracted using random components.

Instances	Heuristic Performance	Attributes
chi500	.csv, 53 kB	.csv, 292 kB
chi1000	.csv, 80 kB	.csv, 417 kB
dimacs	.csv, 11 kB	.csv, 99 kB
mixed (chi500, chi1000, dimacs)	.csv, 140 kB	.csv, 805 kB
testset	.csv, 26 kB	.csv, 211 kB

Hardness Classification

Depending on the results of our evaluation, we classify each instance into one of the following categories:

trivial, contains instances where we already get the best known solution during the computation of features,
trivial2: all instances where, starting with the color number obtained by the greedy algorithms, no further improvements can been found by any search algorithm,
easy: instances where the heuristics find better solutions than the greedy methods, but these results are found by at least 50% of the solvers and in at most 5 seconds,
hard: all other instances which do not belong to one of the introduced categories are categorized as hard.

Instances	Classification
chi500	trivial	trivial2	easy	hard
chi1000	trivial	trivial2	easy	hard
dimacs	trivial	trivial2	easy	hard
testset	trivial	trivial2	easy	hard

ARFF-Files

ARFF-files contain the attributes and the best heuristic for each instance. Note that for the algorithm selection, we focus on only on hard instances.

Data Set	Basic Attributes	Best Configuration
Training Data (hard instances)	.arff, 496 kB	.arff, 2 MB
Test Data (hard instances)	.arff, 93 kB	.arff, 362 kB

[1] Marco Chiarandini Thomas Stützle, An Analysis of Heuristics for Vertex Colouring, In Ed. P. Festa, Proceedings of the 9th International Symposium on Experimental Algorithms 2010,
Lecture Notes in Computer Science, Springer

For more informations, contact me under: mschweng at kr.tuwien.ac.at