Oleg slipped a new feature in ASP 2.1 that I didn’t call too much attention to. That is the “–left-image-crop-win” option which causes the code to process only a subsection of the stereo pair as defined in the coordinates of the left image. Pair this ability with GDAL’s buildvrt to composite tiles and you have yourself an interesting multiprocess variant of ASP. In the next release we’ll be providing a script called “stereo_mpi” that does all of this for you and across multiple computers if you have an MPI environment setup (such as on a cluster or supercomputer).
Our code was already multithreaded and could bring a single computer to its knees with ease. Being a multiprocess application allows us to take over multiple computers. It also allows us to speed up sections of the code that are not thread-safe through parallelization. That is because processes don’t share memory across each other like threads do. Each process gets their own copy of the non-thread-safe ISIS and CSpice libraries and can thus run them simultaneously. However this also means that our image cache system is not shared among the processes. I haven’t noticed this to be too much of a hit in performance.
I still have an account on NASA’s Pleiades, so I decided to create a 3D model of Aeolis Planum using CTX imagery and 3 methods now available in the ASP code repo. Those options are the traditional stereo command using one node, stereo_mpi using one node, and finally stereo_mpi using two nodes. Here are the results:
|Command||Walltime||CPU Time||Mem Used|
|stereo_mpi –mpi 1||00:32:11||11:28:44||5.77 GB|
|stereo_mpi –mpi 2||00:21:46||10:10:22||5.52 GB|
The stereo_mpi command is faster in walltime compared to traditional stereo command entirely because it can parallel process the triangulation step. Unfortunately not every step of ASP can be performed with multiple processes due to interdependencies of the tiles. Here’s a quick handy chart for which steps can be multiprocessed or multithreaded. (Well … we could make the processes actually communicate with each other via MPI but … that would be hard).
|Multithread||x||x||x||x||DG/RPC sessions only|
Just for reference, here’s my VWRC file I used for all 3 runs and the PBS job script for the 2 node example. All runs were performed with Bayes EM subpixel and homography pre-alignment.
[general] default_num_threads = 24 write_pool_size = 15 system_cache_size = 200000000
#PBS -S /bin/bash #PBS -W group_list=##### #PBS -q normal #PBS -l select=2:model=wes #PBS -l walltime=1:30:00 #PBS -m e # Load MPI so we have the MPIEXEC command for the Stereo_MPI script module load mpi-mvapich2/1.4.1/intel cd /u/zmoratto/nobackup/Mars/CTX/Aeolis_Planum_SE stereo_mpi P02_002002_1738_XI_06S208W.cal.cub P03_002279_1737_XI_06S208W.cal.cub mpi2/mpi2 --mpiexec 2 --processe s 16 --threads-multi 4 --threads-single 24
Come to think of it, I was probably cheating the traditional stereo by leaving the write pool size set to 15.
I also tried this same experiment with the HiRISE stereo pair of Hrad Vallis that we ship in our binaries. Unfortunately the single node runs didn’t finish in 8 hours and were shut off. Again, this is homography alignment plus Bayes EM subpixel. Everything would have finished much sooner if I used parabola.
|Command||Walltime||CPU Time||Mem Used||Completed|
|stereo_mpi –mpi 1||08:00:28||181:55:00||10.0 GB||Nope|
|stereo_mpi –mpi 6||02:18:19||221:41:52||44.9 GB||Yep|