Number one complaint from ASP users, make it faster. Number two complaint, is what is the LE90 of their DEM. I’m only going to take a stab at answering the first one in this post. However we’ll look at it from the lazy perspective of just changing the compiler and not implementing any new crazy algorithms, because algorithms are hard.
When we build ASP’s binaries, we use an Apple variant of GCC-4.2 on OSX. When we build our Linux binaries, we use GCC 4.4.6 from RHEL6. Those compilers are relatively old, the newest GCC 4.4.6, was compiled back in 2010. Since then, new versions of GCC have been released. Clang++ has also been maturing. There have even been new processor instruction sets that have been released, like the 256 bit wide AVX.
The first test I performed was simply recording the run time for our unit tests on the Bayes EM subpixel refinement algorithm. I tested on both an OSX 10.7.5 system with a Core i7-2675QM and then an Ubuntu 12.04 system with an AMD FX 8120. Both systems support the AVX instruction set. I was able to get newer compilers on the OSX system by using MacPorts. For the Ubuntu box, I installed almost everything through Aptitude. However, I got the Clang-3.1 binaries directly from the LLVM website.
Compiler | -O3 -mno-avx | -O3 -mavx | -O3 -mavx -funsafe | -0fast -mavx -funsafe | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sum Px Error | Avg Time | Std Dev | Sum Px Error | Avg Time | Std Dev | Sum Px Error | Avg Time | Std Dev | Sum Px Error | Avg Time | Std Dev | ||
g++-4.4 | 0.871589 | 2.714 | 0.146 | 0.871589 | 2.629 | 0.142 | 0.887567 | 2.629 | 0.172 | ||||
g++-4.5 | 0.871589 | 2.621 | 0.05 | 0.871589 | 2.587 | 0.034 | 0.887566 | 2.669 | 0.183 | ||||
g++-4.6 | 0.871589 | 2.493 | 0.009 | 0.871589 | 2.743 | 0.1 | 0.88774 | 2.542 | 0.173 | 0.88774 | 2.285 | 0.125 | |
g++-4.7 | 0.871589 | 2.439 | 0.017 | 0.871589 | 2.62 | 0.127 | 0.887566 | 2.581 | 0.111 | 0.887566 | 2.36 | 0.202 | |
clang++-2.9 | segfaulted | ||||||||||||
clang++-3.0 | 0.871589 | 2.29 | 0.195 | 14.2007 | 2.475 | 0.159 | 14.2007 | 2.44 | 0.102 | ||||
clang++-3.1 | 0.871589 | 2.434 | 0.215 | 0.871589 | 2.492 | 0.238 | 0.87157 | 2.309 | 0.225 |
Compiler | -O3 | -O3 -funsafe | -Ofast | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sum Px Error | Avg Time | Std Dev | Sum Px Error | Avg Time | Std Dev | Sum Px Error | Avg Time | Std Dev | |||||
g++-4.2 | 0.871582 | 2.59 | 0.103 | 0.887563 | 2.52 | 0.111 | |||||||
g++-4.4.7 | 0.871582 | 2.48 | 0.212 | 0.887563 | 2.27 | 0.027 | |||||||
g++-4.5.4 | 0.871582 | 2.265 | 0.03 | 0.887564 | 2.187 | 0.032 | |||||||
g++-4.7.1 | 0.871582 | 2.122 | 0.036 | 0.887777 | 2.005 | 0.02 | 0.887777 | 1.905 | 0.011 | ||||
clang++-2.1 | 0.871582 | 2.193 | 0.021 | 0.871582 | 2.485 | 0.313 | |||||||
clang++-2.9 | 0.871582 | 2.273 | 0.014 | 0.871582 | 2.247 | 0.039 | |||||||
clang++-3.1 | 0.871582 | 1.996 | 0.013 | 0.871586 | 1.91 | 0.014 | |||||||
llvm-g++-4.2 | 0.871582 | 2.149 | 0.008 | 0.871582 | 2.19 | 0.027 |
I tested Clang-2.9 on Ubuntu. Unfortunately every compile operation resulted in an internal seg-fault. Clang-3.0 worked most of the time, until I manually turned on ‘-mavx’. This caused no improvement in BayesEM performance, however it did cause the test code to return bad results. Overall, GCC 4.7 and Clang 3.1 showed about 20% improvement in speed over GCC 4.4.
I also tested the performance of our integer correlator under different compilers.
Compiler | -O3 -mno-avx | -O3 -mavx | -O3 -mavx -funsafe | -Ofast -mavx -funsafe | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Avg Time | Std Dev | Avg Time | Std Dev | Avg Time | Std Dev | Avg Time | Std Dev | ||||||
g++-4.4 | 8.288 | 0.037 | 8.136 | 0.03 | 8.127 | 0.032 | |||||||
g++-4.5 | 8.396 | 0.014 | 8.267 | 0.024 | 8.326 | 0.022 | |||||||
g++-4.6 | 5.168 | 0.078 | 5.094 | 0.022 | 5.102 | 0.019 | 5.11 | 0.022 | |||||
g++-4.7 | 4.525 | 0.019 | 4.624 | 0.014 | 4.669 | 0.012 | 4.638 | 0.017 | |||||
clang++-2.9 | |||||||||||||
clang++-3.0 | 5.147 | 0.053 | 5.079 | 0.094 | 5.06 | 0.012 | |||||||
clang++-3.1 | 5.119 | 0.012 | 5.059 | 0.32 | 4.949 | 0.016 |
Compiler | -O3 | -O3 -funsafe | -Ofast | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Avg Time | Std Dev | Avg Time | Std Dev | Avg Time | Std Dev | ||||||||
g++-4.2 | 8.973 | 0.096 | 8.654 | 0.047 | |||||||||
g++-4.4.7 | 8.61 | 0.034 | 8.654 | 0.181 | |||||||||
g++-4.5.4 | 8.131 | 0.083 | 7.67 | 0.033 | |||||||||
g++-4.7.1 | 4.044 | 0.024 | 4.084 | 0.03 | 3.9 | 0.023 | |||||||
clang++-2.1 | 5.077 | 0.023 | 5.072 | 0.029 | |||||||||
clang++-2.9 | 5.211 | 0.032 | 5.192 | 0.013 | |||||||||
clang++-3.1 | 4.966 | 0.018 | 4.973 | 0.027 | |||||||||
llvm-g++-4.2 | 5.097 | 0.023 | 5.113 | 0.021 |
Here, the newer compilers showed significant performance gains. GCC 4.7 and Clang 3.1 both showed a 100% speed improvement over GCC 4.4. Clang also managed to compile the code correctly every time unlike in the Bayes EM tests. However I would still recommend sticking with the safe and stable GCC. Their 4.7 release was able to get just as much or better performance than the Clang compilers. GCC just provides the comfort of mind knowing that it has always been able to compile VW correctly. Clang still has me on edge since it burned me so many times because it produced segfaulting assembly instructions since it is so aggressive with SIMD.