Moar Processes

Oleg slipped a new feature in ASP 2.1 that I didn’t call too much attention to. That is the “–left-image-crop-win” option which causes the code to process only a subsection of the stereo pair as defined in the coordinates of the left image. Pair this ability with GDAL’s buildvrt to composite tiles and you have yourself an interesting multiprocess variant of ASP. In the next release we’ll be providing a script called “stereo_mpi” that does all of this for you and across multiple computers if you have an MPI environment setup (such as on a cluster or supercomputer).

Our code was already multithreaded and could bring a single computer to its knees with ease. Being a multiprocess application allows us to take over multiple computers. It also allows us to speed up sections of the code that are not thread-safe through parallelization. That is because processes don’t share memory across each other like threads do. Each process gets their own copy of the non-thread-safe ISIS and CSpice libraries and can thus run them simultaneously. However this also means that our image cache system is not shared among the processes. I haven’t noticed this to be too much of a hit in performance.

I still have an account on NASA’s Pleiades, so I decided to create a 3D model of Aeolis Planum using CTX imagery and 3 methods now available in the ASP code repo. Those options are the traditional stereo command using one node, stereo_mpi using one node, and finally stereo_mpi using two nodes. Here are the results:

Aeolis Planum processing runs on Westmere Nodes
Command Walltime CPU Time Mem Used
stereo 00:44:40 08:46:57 1.71 GB
stereo_mpi –mpi 1 00:32:11 11:28:44 5.77 GB
stereo_mpi –mpi 2 00:21:46 10:10:22 5.52 GB

The stereo_mpi command is faster in walltime compared to traditional stereo command entirely because it can parallel process the triangulation step. Unfortunately not every step of ASP can be performed with multiple processes due to interdependencies of the tiles. Here’s a quick handy chart for which steps can be multiprocessed or multithreaded. (Well … we could make the processes actually communicate with each other via MPI but … that would be hard).

ASP Steps: Multiprocess or Multithreaded
PPRC CORR RFNE FLTR TRI
Multithread x x x x DG/RPC sessions only
Multiprocess x x x

Just for reference, here’s my VWRC file I used for all 3 runs and the PBS job script for the 2 node example. All runs were performed with Bayes EM subpixel and homography pre-alignment.

[general]
default_num_threads = 24
write_pool_size = 15
system_cache_size = 200000000
#PBS -S /bin/bash
#PBS -W group_list=#####
#PBS -q normal
#PBS -l select=2:model=wes
#PBS -l walltime=1:30:00
#PBS -m e

# Load MPI so we have the MPIEXEC command for the Stereo_MPI script
module load mpi-mvapich2/1.4.1/intel

cd /u/zmoratto/nobackup/Mars/CTX/Aeolis_Planum_SE
stereo_mpi P02_002002_1738_XI_06S208W.cal.cub P03_002279_1737_XI_06S208W.cal.cub mpi2/mpi2 --mpiexec 2 --processe
s 16 --threads-multi 4 --threads-single 24

Come to think of it, I was probably cheating the traditional stereo by leaving the write pool size set to 15.

Update 2/4/13

I also tried this same experiment with the HiRISE stereo pair of Hrad Vallis that we ship in our binaries. Unfortunately the single node runs didn’t finish in 8 hours and were shut off. Again, this is homography alignment plus Bayes EM subpixel. Everything would have finished much sooner if I used parabola.

HiRISE Hrad Vallis processing runs on Westmere Nodes
Command Walltime CPU Time Mem Used Completed
stereo 08:00:24 106:31:38 2.59 GB Nope
stereo_mpi –mpi 1 08:00:28 181:55:00 10.0 GB Nope
stereo_mpi –mpi 6 02:18:19 221:41:52 44.9 GB Yep

Pleiades Introduction

Here’s my presentation PDF. This is the first draft. I’m probably going to change this presentation a little bit more in the future.

Introduction to Pleiades

Don’t accept my presentation as the only truth. If you are interested in using the super computer or have questions, please refer to the NAS website.