Learning and Intelligent OptimizatioN Conference (LION 7), Catania - Italy, Jan 7-11, 2013. Lecture Notes in Computer Science, to appear.
We present an automated algorithm selection method based on machine learning for the graph coloring problem (GCP). For this purpose, we identify 78 features for this problem and evaluate the performance of six tate-of-the-art (meta)heuristics for the GCP. We use the obtained data to train several classiffication algorithms that are applied to predict on a new instance the algorithm with the highest expected performance. To achieve better performance for the machine learning algorithms, we investigate the impact of parameters, and evaluate different data discretization and feature selection methods. Finally, we evaluate our approach, which exploits the existing GCP techniques and the automated algorithm selection, and compare it with existing heuristic algorithms. Experimental results show that the GCP solver based on machine learning outperforms previous methods on benchmark instances.
Instances | Source | Description |
chi500 | http://www.imada.sdu.dk/~marco/gcp-study/ | 520 instances with 500 vertices and different density (0.1, 0.5, 0.9), used in [1] |
chi1000 | http://www.imada.sdu.dk/~marco/gcp-study/ | 740 instances with 1000 vertices and different density (0.1, 0.5, 0.9), used in [1] |
dimacs | http://mat.gsia.cmu.edu/COLOR04/ and http://mat.gsia.cmu.edu/COLOR04/clq.html | Instances of the coloring and the clique part of the DIMACS challenge. |
testset | download (.zip, 163 MB) | 180 graphs of different size (500,750,1000 and 1250 nodes) and density (0.1, 0.5, 0.9) |
Heuristic Performance: Results of the evaluation on the instances. The result for an heuristic is the lowest number of colors for that a feasible coloring has been found in more than 50% of the runs. In addition, we add the best and worst result of the two greedy methods (RLF and DSATUR) and the best known solution (BKS) found so far.
Attributes: Basic attributes of the instances. Note that some values (e.g. regarding the greedy methods or local search probing) are extracted using random components.
Instances | Heuristic Performance | Attributes |
chi500 | .csv, 53 kB | .csv, 292 kB |
chi1000 | .csv, 80 kB | .csv, 417 kB |
dimacs | .csv, 11 kB | .csv, 99 kB |
mixed (chi500, chi1000, dimacs) | .csv, 140 kB | .csv, 805 kB |
testset | .csv, 26 kB | .csv, 211 kB |
Depending on the results of our evaluation, we classify each instance into one of the following categories:
Instances | Classification |
|||
chi500 | trivial | trivial2 | easy | hard |
chi1000 | trivial | trivial2 | easy | hard |
dimacs | trivial | trivial2 | easy | hard |
testset | trivial | trivial2 | easy | hard |
ARFF-files contain the attributes and the best heuristic for each instance. Note that for the algorithm selection, we focus on only on hard instances.
Data Set | Basic Attributes |
Best Configuration |
Training Data (hard instances) | .arff, 496 kB | .arff, 2 MB |
Test Data (hard instances) | .arff, 93 kB | .arff, 362 kB |
[1] Marco Chiarandini Thomas Stützle, An Analysis of Heuristics for Vertex Colouring, In Ed. P. Festa, Proceedings of the 9th International Symposium on Experimental Algorithms 2010,
Lecture Notes in Computer Science, Springer
For more informations, contact me under: mschweng at kr.tuwien.ac.at