Goodbye and Thanks for all the Fish!

July will be my final month working at NASA Ames Research Center. It has been a great run working under the ill-defined job title as “Geospatial Software Architect”. I’ve gotten a chance to work as Software Dev, a Principal Investigator, and most recently as a Flight Software Lead on a new robot for the ISS called Astrobee. I’m leaving this comfortable blanket I’ve had within NASA for 6 years for a chance to do computer vision at Google for Project Tango. Something I find exciting and terrifying, but I view this as an opportunity to learn ever more.

Despite this new opportunity I feel like I’m leaving my baby, Ames Stereo Pipeline (ASP). Honestly, though, I’ve been doing less and less with ASP for a while now. Oleg has been lead developer for quite sometime. Under his guidance the software has gotten more features that everyone wants and more people have started using it for Earth Science. Clearly Oleg is doing a great job! Recently another coworker, Scott, has started improving ASP as well. With those two on the job, I feel like ASP will continue to grow.

I’m extremely proud that a community has developed around ASP. I’m also grateful to APL UofW and the PGC for putting faith into the software. Their time spent using ASP, requesting changes, and offer solutions to bugs has made ASP a product worthwhile. I’m sad I won’t get to be involved anymore or at least hear about what new applications scientists have thought up. My time involved developing ASP was wonderful and was perfect for honing my skills. I hope others can do the same through the use and understanding of how it works.

Thank you ASP users and the Intelligent Robotics Group. It was fun!

Automated Glacier Modelling from Digital Globe Imagery

David Shean from the University of Washington talks at FOSS4G 2014 about using Ames Stereo Pipeline in his work to autonomously model glaciers at 2 m / px using data from Digital Globe.

Thoughts on improving ASP Stereo

Ames Stereo Pipeline’s (ASP) current integer correlator leaves a bit to be desired. Currently it does poorly in scenes with aggressively changing slopes. It is also a coin flip if it finishes in an hour or several days. So I’ve been working on researching a new correlator by reading, implementing, and applying to satellite imagery a select few of the top performers from the Middlebury stereo competition. I started this a long time ago with PatchMatch and I never gave a good conclusion. Now I will summarize by experiences and give a short introduction into the current solution I’m pursuing.

Algorithm Shoot Out!

Semi Global Matching: [1] This is a world recognized well performing stereo algorithm. I don’t need to say its graces. The cons in my opinion are that it uses a lot of memory and that it is only applicable to 1-D searching. For ASP we like to have 2-D searching solution, or optical flow, to handle flaws in the user’s input data and because some users have actual used us for the creation of velocity maps. We might have been to get around the inaccuracies in our users data and the horrors of linescan cameras by calculating a local epipolar vector for each pixel after a bundle adjustment. But I believe we wouldn’t catch the vertical CCD shifts and jitter seen in HiRISE and World View satellites. As for the memory problem, there have been derivative SGM algorithms to fix this problem, but I didn’t evaluate them.

PatchMatch: [2] I really love the idea of starting with a uniform noise guess for the disparity and then propagating lowest cost scores to the neighbors. There were a couple downsides to this algorithm for satellite processing. 1. The cost metric of absolute differencing intensities and gradients performed much worse than an NCC cost metric in the arctic. 2. The run time was horrible because each pixel evaluation didn’t reuse previous comparison used by neighboring pixels. 3. Their slanted window needed to be adapted to support slants in the vertical direction as well as the horizontal for our optical flow demands. I couldn’t find a formulation that would stop the algorithm from cheating by defining the window as 90 degrees from the image geometry. In other words, the PatchMatch algorithm kept finding out that the correlation score was minimal if you define the kernel as having no area.

Despite all of this, a plain jane implementation of PatchMatch using NCC and non-slanted windows performs the same as a brute force dense evaluation of a template window across all disparity values. This also means that places were brute force search fails, so would PatchMatch. But, maybe for extremely large search ranges, PatchMatch might be worth its processing time. I will keep this in the back of mind forever.

PatchMatch with Huber Regularization: [3] This is a neat idea that is built on top of Steinbruecker and Thomas Pock’s “Large Displacement Optical Flow Computation without Warping” [4]. (Seriously though, Thomas Pock hit a gold mine with lets apply a regularity term to everything in computer vision and show an improvement.) I eventually learned how to implement primal dual convex optimization using Handa’s guide [5]. I realize now that everything I need to know is in Heise’s paper [3], but it took me a long time to understand that. But I never implement exactly what the paper described. They wanted a smoothness constraint applied to both the disparity and the normal vector used to define the correlation kernel. Since I couldn’t define a slanted correlation kernel that worked both in horizontal and vertical directions as seen in PatchMatch, I just dropped this feature. Meaning I only implemented a smoothness constraint with the disparity. Implementing this becomes a parameter tuning hell. I could sometimes get this algorithm to produce a reasonable looking disparity. But if I ran it for a few more iterations, it would then proceed to turn slopes into constant disparity values until it hit a color gradient in the input image. So it became a very difficult question for me of, at what point in the iterations do I get a good result? How do I know if this pretty result is actually a valid measurement and not something the smoothness constraint glued together because it managed to out weight the correlation metric?

In the image I provided above, you can see a slight clustering or stair-casing of the disparity as the smoothness constraint wants disparity values to match their neighbors. Also, random noise spikes would appear and neither the total variance (TV) term or the data term would remove them. They are stable minimas. I wonder if a TVL1 smoothnss term would be better than a TVHuber.

As Rigid As Possible Stereo under Second Order Smoothness Priors: [6] This paper again repeats the idea seen in PatchMatch Huber regularization of having a data term, a regularization term, and theta that with increasing iterations forces the two terms to converge. What I thought was interesting here was their data term. Instead of matching templates between the images for each pixel, instead break the image into quadratic surfaces and then refine the quadratic surfaces. This is incredibly fast evaluating even when using a derivative free Nelder Mead simplex algorithm. Like several orders of magnitude faster. Unfortunately this algorithm has several cons again. 1. They wanted to use the cost metric seen in PatchMatch that again doesn’t work for the satellite imagery of the arctic that I have been evaluating. 2. The data term is incredibly sensitive to its initial seed. If you can’t find a surface that is close to the correct result, the Nelder Mead algorithm will walk away. 3. This algorithm with a smoothness prior is again a parameter tuning hell. I’m not sure that what I tune up for my images will work equally well for the planetary scientists as well as the polar scientists.

Fast Cost-Volume Filtering for Visual Correspondence and Beyond: [7] This is an improvement algorithm to the KAIST paper about Adaptive Support Weights. [8] (Hooray KAIST! Send us more of your grad students.)  They say hey, this is actually a bilateral filter that Yoon is talking about.  They also recently read a paper about performing a fast approximate of the bilateral filter by using a ‘guided’ filter. In the end this is similar to a brute force search except now there is fancy per pixel weighting for each kernel based on image color. This algorithm is easy to implement but fails to scale to larger search regions just like brute force search. Yes this can be applied in a pyramidal fashion but I think in the next section that I’ve hit on a better algorithm. I wouldn’t count this algorithm out all together though. I think it has benefit as a refinement algorithm to the disparity, specifically in cases of urban environments with hard disparity transitions.

What am I pursuing now?

Our users have long known that they could get better results in ASP by first map projecting their input imagery on a prior DEM source like SRTM or MOLA. This reduces the search range. But it also warps the input imagery so that from the perspective of the correlator, the imagery doesn’t have slopes anymore. The downside is that this requires a lot of work on the behalf of the user. They must run a bunch more commands and must also find a prior elevation source. This prior elevation source may or may not correctly register with their new satellite imagery.

My coworker Oleg hit upon an idea of instead using a lower resolution disparity, smoothing it, and then using that disparity to warp the right image to the left before running the final correlation. It’s like map projecting, except with out the maps, camera models, and prior existing elevation source. I’ve been playing with it and made a pyramidal version of this idea. Each layer of the pyramid takes the previous disparity, smooths it, and the warps the right image to the left. Here is an example of a disparity produced with this method up against current ASP correlator’s result. I have single thread rough prototype variant and an in-progress parallel variant I’m working on.

Looks pretty good right? There are some blemishes still that I hope to correct. Surprisingly the parallel implementation of this iterated warping correlator is 2x faster than our current pyramid correlator. Another surprising feature is that the runtime for this mapping algorithm is mostly constant despite the image content. For consecutive pyramid levels, we’ll always be searching a fixed square region, whereas the original ASP pyramid correlator will need to continually adapt to terrain it sees. Once I finish tuning this new algorithm I’ll write another post on exactly why this is the case. There is also a bit of a black art for smoothing the disparity that is used for remapping the right image.


I’m pretty excited again about finding a better correlator for ASP. I still have concerns about how this iterative mapping algorithm will handle occlusions. I also found out that our idea is not completely new. My friend Randy Sargent has been peddling this idea for a while [9]. He even implemented it for the Microscopic Imager (MI) on board the Mars Exploration Rovers. I didn’t even know that software existed! But they used homography matrices, while our ‘new’ idea is using a continuous function. In the end, I hope some of you find my diving into stereo research papers useful. I learned about a lot of cool ideas. Unfortunately very few of them scale to satellite imagery.


[1] Hirschmuller, Heiko. “Accurate and efficient stereo processing by semi-global matching and mutual information.” Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 2. IEEE, 2005.
[2] Bleyer, Michael, Christoph Rhemann, and Carsten Rother. “PatchMatch Stereo-Stereo Matching with Slanted Support Windows.” BMVC. Vol. 11. 2011.
[3] Heise, Philipp, et al. “PM-Huber: PatchMatch with Huber Regularization for Stereo Matching.” Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013.
[4] Steinbrucker, Frank, Thomas Pock, and Daniel Cremers. “Large displacement optical flow computation withoutwarping.” Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009.
[5] Handa, Ankur, et al. Applications of Legendre-Fenchel transformation to computer vision problems. Vol. 45. Tech. Rep. DTR11-7, Department of Computing at Imperial College London, 2011.
[6] Zhang, Chi, et al. “As-Rigid-As-Possible Stereo under Second Order Smoothness Priors.” Computer Vision–ECCV 2014. Springer International Publishing, 2014. 112-126.
[7] Rhemann, Christoph, et al. “Fast cost-volume filtering for visual correspondence and beyond.” Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.
[8] Yoon, Kuk-Jin, and In So Kweon. “Adaptive support-weight approach for correspondence search.” IEEE Transactions on Pattern Analysis and Machine Intelligence 28.4 (2006): 650-656.
[9] Sargent, Randy, et al. “The Ames MER microscopic imager toolkit.” Aerospace Conference, 2005 IEEE. IEEE, 2005.

Patch Match Stereo

3 months ago during my Semi Global Matching post, I mentioned I would have a follow along post about another algorithm I was interested in. That algorithm is PatchMatch, and is in my opinion a better fit for satellite processing. It was first written for hole filling [1] and then later it was applied to stereo [2]. Unlike Ames Stereo Pipeline’s currently implemented hierarchical discrete correlator, PatchMatch easily applies to high dimensionality searching and it has a run time that corresponds to number of iterations and image size. Not image content, which I hope interprets into predictable run times. Unfortunately my current implementation is really slow and has several faults. However my hope is that eventually we’ll make it fast and that it’s predictability will enable our users to budget for the CPU cost of creating a DEM while still improving quality.

How it Works

The algorithm for PatchMatch is actually very simple. The disparity image is seeded with uniform noise. That noise is random guesses for what the disparity should be. However since every pixel is another guess, and the image is quite a bit larger than the search range we are looking over, a handful of these guesses are bound to be close to correct. We then propagate disparities with low cost to the neighboring disparities.  The cycle repeats by then adding more noise to the current propagated disparity but at half the previous amplitude. This cycle repeats until everything converges. Surprisingly, a recognizable disparity appears after the first iteration.

If you are having a hard time visualizing this, here is a video from the original authors of Patch Match Stereo.

PatchMatch can be quick in the hole filling application and probably for stereo from video. (Propagating between sequential frames can greatly help convergence.) Unfortunately it is quite slow in use to pairwise stereo. When performing stereo correlation in ASP 2.3, we group kernel evaluations by being at the same disparity value (sometimes called disparity plane in literature). This means that overlapping kernel evaluations will reuse pixel comparisons. The reuse of pixel comparisons is a huge speed boost. My implementation of PatchMatch has none of that. My implementation is also solving for a floating-point precision of disparity. While this gives me very detailed disparity maps, the downside is that my implementation spends 75% of its time performing interpolation. I think for this algorithm to become useful for researchers, I’ll need to discretize the disparities and prerender the input as super sampled to avoid repeated interpolation calculations.

I have one final statement to make on PatchMatch algorithm. In practice, PatchMatch produces results that are very similar to performing a brute force search over the entire search range where winner (lowest cost) takes all. That is different from ASP’s hierarchal search, which at each pyramid level falls into local minimums. It is only the hierarchal part of it that has any use in finding the global. What this means for PatchMatch is that we need to use a cost metric that is globally minimal. For narrow baseline stereo in a controlled lighting environment, the fast Sum of Absolute Differences (SAD) fits the bill. But in the cruel realities of satellite imagery, the only solution is Normalized Cross Correlation (NCC). Unfortunately, NCC is quite a bit slower to evaluate than SAD. Also, in cases where SAD worked, NCC performs worse (probably due to being sensitive to the calculation of mean?).

Where PatchMatch does poorly

I’ve already hit my primary concern, which is the speed of Patch Match. However I think if we view PatchMatch as replacement for both correlation and BayesEM, it already appears cost effective. A different concern is what happens when the search range’s area is larger than the area of the image being correlated. A real world example would be rasterizing a 256^2 tile for a WV2 image where the search range is 1400×100. The area of the tile is less than the search’s. The number of close valid guesses seeded by the initial uniform noise drops dramatically. It might be a good idea to then take the interest points that ASP finds and dump them in the initial random disparity guess that PatchMatch evaluates. This would insure there is a disparity to propagate from.

Previously I mentioned that PatchMatch in my experiments seems to behave as a brute force winner takes all algorithms. This means that increase the search size also means a decrease in quality because our odds of finding an outlier with less cost than the actually correct match have gone up. So maybe completely abandoning hierarchal search entirely is a bad idea.  Other ideas might be enforcing a smoothness constraint that would heavily penalize the random jumping that characterizes outliers. The enforcement of smoothness constraint was the driving force behind the power of Semi Global Matching.

Expanding the Algorithm

Since kernel evaluations in PatchMatch are individual problems and there is no shared arithmetic like there is in ASP’s stereo correlation, it makes it easy to add on some advance algorithms to PatchMatch. The original PatchMatch Stereo paper [#] mentioned adding on parameters to the disparity maps so that an affine window could be used for matching. This would improve results on slopes and I believe would be a better alternative to prior map projecting the input imagery.

Another idea mentioned in the paper was adding Adaptive Support Weights (ASW) [3] to each kernel evaluation. It adds weighting to pixels that match the color of the center of the kernel in addition to weighting central pixels more importantly than pixels at the edge. The idea is that pixels of the same color are likely to be at the same disparity. While not always correct, it definitely applies to scenarios where the top of a building is a different color than its sides. Implementations I show in my figures operate only on grayscale imagery and not color like the original paper proposed.

In practice, this additional weighting does a good job at preserving edges at depth discontinuity. This is important for cliffs and for buildings. A proposed improvement is geodesic adaptive support weights, which weighs same color pixel heavily that are connected to the central pixels. This fixes problems like a picture of blades of grass, where the blades have the same color but are actually at different disparities.

Wrapping back around to the idea that Patch Match needs to have a cost metric that is globally minimal. It might be worth while exploring different cost metric such as Mutual Information or if I remember correctly, the fast evaluating Matching by Tone Mapping (MTM) [4]. Nice side effect of this would be that correlation is completely lighting invariant and could even correlate between infrared and visible.

Additional Comparisons

Here’s a disparity image of NASA Ames Research Center captured from a Pleiades satellite. Whatever they do to pre-process their imagery, I couldn’t epipolar rectify that imagery. Search range given to both ASP and PatchMatch was [-40,-20,70,40]. Despite being noisy, I like how the PatchMatch w/ ASW preserved straight edges. The specularity of some of the roof lights on a hanger however really threw ASW for a loop.

I also processed a crop from a World View 2 image of somewhere in the McMurdo Dry Valleys. This really shot a hole in my argument that Patch Match would be completely invariant to search range. The search range was [-700,-50,820,50]. I did however have to hand tune the search range to get this excellent result from ASP. The automatic detection initially missed the top of the mountains.

Source Code

I’m being messy because this is my research code and not something production. You’ll need to understand VW and Makefiles if you want to compile and modify.

Further Research

I’m still diving through papers during my free evenings. I don’t do this at work because I have another project that is taking my soul. However there appears to be a good paper called Patch Match Filter that tries to improve speed through super pixel calculation [5]. There is also an implementation of PatchMatch that adds a smoothness constraint that performs better than the original PatchMatch paper [6]. When it came out, it was the best performing algorithm on the Middlebury dataset. However, recently another graph cut algorithm dethroned it. I myself will also just look at applying patch match as a refinement to a noisy disparity result from ASP 2.3 using a kernel size of 3. Having a small kernel still evaluates extremely quickly even if the search range is huge.

I’m sorry this post doesn’t end in a definitive conclusion. However I hope you found it interesting.


[1] Barnes, Connelly, et al. “PatchMatch: a randomized correspondence algorithm for structural image editing.” ACM Transactions on Graphics-TOG 28.3 (2009): 24.
[2] Bleyer, Michael, Christoph Rhemann, and Carsten Rother. “PatchMatch Stereo-Stereo Matching with Slanted Support Windows.” BMVC. Vol. 11. 2011.
[3] Yoon, Kuk-Jin, and In So Kweon. “Adaptive support-weight approach for correspondence search.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 28.4 (2006): 650-656.
[4] Hel-Or, Yacov, Hagit Hel-Or, and Eyal David. “Fast template matching in non-linear tone-mapped images.” Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011.
[5] Lu, Jiangbo, et al. “Patch Match Filter: Efficient Edge-Aware Filtering Meets Randomized Search for Fast Correspondence Field Estimation.” Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference. 2013.
[6] Heise, Philipp, et al. “PM-Huber: PatchMatch with Huber Regularization for Stereo Matching.”

PC Align

Last month we had a new release of Ames Stereo Pipeline, version 2.3! We’ve performed a lot of bug fixing and implementing new features. But my two new prized features in ASP are pc_align and lronac2mosaic. Today I’d like to only introduce pc_align, a utility for registering DEMs, LIDAR points, and ASP point clouds to each other. All you have to do is specify an input, a reference, and an approximate estimate for how bad you think the misplacement is.

Does that sound like magic? Under the hood, pc_align is performing an implementation of the iterative closest point algorithm (ICP). Specifically we are using internally libpointmatcher library from ETH. What ICP does is iteratively attempt to match every point of the input with the nearest neighbor in the reference point cloud. ICP then solves for a transform that would globally reduce the distances between the current set of matches. Then it repeats itself and performs a new round of matching input to reference and then again solves for another global step. Repeat, repeat, repeat until we no longer see any improvements in the sum of distances between matches.

This means that failure cases for PC_align are when the reference set is too coarse to describe the features seen in the input. An example would be having a HiRISE DEM and then only having a single orbit of MOLA that intersects. A single line of MOLA does nothing to constrain the DEM about the axis of the shot line. The 300 meter post spacing might also not be detailed enough to constrain a DEM that is looking at small features such as dunes on a mostly flat plane. What is required is a large feature that both the DEM and the LIDAR source can resolve.

CTX to HRSC example

Enough about how it works. Examples! I’m going to be uncreative and just process CTX and Gale Crater because they’re fast to process and easy to find. I’ve processed the CTX images P21_009149_1752_XI_04S222W, P21_009294_1752_XI_04S222W, P22_009650_1772_XI_02S222W, and P22_009716_1773_XI_02S223W using stereo options “–alignment affineepipolar –subpixel-mode 1”. This means correlation happens in 15 minutes and then triangulation takes an hour because ISIS subroutines are not thread safe. I’ve plotted these two DEMs on top of a DLR HRSC product, H1927_0000_DT4.IMG. It is important to note that this version of the DLR DEM is referenced against the Mars ellipsoid and not the Aeroid. If your data is referenced against the Aeroid, you’ll need to use dem_geoid to temporarily remove it for processing with pc_align who only understand ellipsoidal datums.

In the above picture, the two ASP created DEMs stick out like sore thumbs and are misplaced by some 200 meters. This is due to pointing information for MRO and subsequently CTX being imperfect (how much can you ask for anyway?). You could run jigsaw and that is the gold standard solution, but that takes a lot of manual effort. So instead let’s use pc_align with the commands shown next.

> pc_align --max-displacement 200 H1927_0000_DT4.tif P22-PC.tif \
        --save-transformed-source-points  -o P22_align/P22_align \
> point2dem --t_srs "+proj=sinu +lon_0=138 +x_0=0 +y_0=0 \
        +a=3396000 +b=3396000 +units=m +no_defs" \
        --nodata -32767 P22_align/P22_align-trans_source.tif

There are 3 important observations to make from the command line above. (1) We are using the max displacement option to set the upper bound of how bad we think we are, 200 meters. (2) I’m feeding the ASP PC file as the input source instead of ASP’s DEM. This is because in (3) with the save transformed source points we’ll be writing out another PC file. PC align can only export PC files so we always have to perform another round of point2dem. It is possible to run stereo -> point2dem -> PC Align -> point2dem, but it means you are unneccesarily resampling your data once. Using the PC file directly from stereo saves us from potential aliasing and removes a point2dem call.

Here’s the final result where both CTX DEMs are plotted on top of the HRSC DEM. Everything looks really good except for that left edge. This might be because the DEM was rendered with incorrect geometry and the output DEM is subtly warped from a perfect solution.

Another cool feature that pc_align author Oleg Alexandrov added was recording the beginning and ending matching errors in CSV files. They’re found with the names <prefix>-{beg,end}_errors.csv. You can load those up in QGIS and plot theirs errors to visualize that pc_align uniformly reduced matching error across the map. (Thanks Ross for showing me how to do this!)


Quite a few MOLA shots can be found inside the CTX footprints. Above is a plot of all MOLA PEDR data for Gale crater. I was able to download this information in CSV format from the MOLA PEDR Query tool from Washington University St. Louis. Conveniently PC_align can read CSV files, just not in the format provided by this tool. PC_align is expecting the data to be in format long, lat, elevation against ellipsoid. What is provide is lat, long, elevation against aeroid, and then radius. So I had to manually edit the CSV in Excel to be in the correct order and create my elevation values by subtracting 3396190 (this number was wrong in first draft) from the radius column. The other added bit of information needed with CSV files is that you’ll need to define the datum to use. If you don’t, pc_align will assume you’re using WGS84.

> pc_align --max-displacement 200 P22-PC.tif mola.csv\
        -o P22_mola/P22_mola --datum D_MARS --save-inv-trans \
> point2dem --t_srs "+proj=sinu +lon_0=138 +x_0=0 +y_0=0 \
        +a=3396000 +b=3396000 +units=m +no_defs" \
        --nodata -32767 P22_mola/P22_mola-trans_reference.tif

Two things to notice in these commands, the inputs are backwards from before and I’m saving the inverse transform.  You can keep things in the same order as when I was aligning to HRSC,  it is just that things will run very slowly. For performance reasons, the denser source should be considered the reference and then you must request the reference to be transformed to the source. You’ll likely routinely be using this inverse form with LIDAR sources.

In the end I was able to reduce alignment error for my CTX DEMs from being over 50 meters to being less than 15 meters against MOLA and from over 100 meter to 40 meters error against HRSC. A result I’m quite happy with for a single night processing at home. You can see my final composited MOLA registered CTX DEMs on the left. The ASP team will have more information about pc_align in LPSC abstract form next year. We also hope that you try out pc_align and find it worth regular use in your research.


I goofed in the MOLA example! Using D_MARS implies a datum that is a sphere with 3396190 meter radius. I subtracted the wrong number from MOLA’s radius measurement before (the value 3396000). That probably had some effect on the registration result shown in the pictures, but this mistake is smaller than the shot spacing of MOLA. Meaning the horizontal registration is fine, but my output DTMs are 190 meters higher than they should have been. FYI, D_MOON implies a datum that is a sphere with radius 1737400 meters.

Advances in LRO-NAC processing in ASP

Since I last wrote, we’ve hired a new full-time employee. His name is Scott and we assigned him the task of learning ASP and LROC. The first utilities he’ll be contributing back to ASP are lronacjitreg and The first utility is very similar to the ISIS utility with a similar name designed for HiRISE. The second utility,, uses the first tool and can take LRO-NAC EDR imagery and make a non-projected image mosaic. What does internally is ‘noproj’ and then ‘handmos’ the images together. There is an offset between the images due to model imperfections. Finding the correct offset so the combined images are seamless is the task of lronacjitreg. All of this just a streamed line version of what I wrote in a past blog post.

Previously users and our team only had the option to run all 4 combinations of the 4 LRO-NAC input files through ASP and then glue them together afterwards. Now with the use of lronac2mosaic, we can feed whole LRO-NAC observations into ASP and receive the full DTM in one go. No messy mosaicking of 4 files.

I’ve used Scott’s program successfully to recreate most DTMs that ASU has made via SOCET SET. Using my home server, I’ve been able to recreate 77 of their DTMs to date. We’ve been fixing bugs as we hit them. One of the biggest was in our search range guessing code. The next upcoming release of ASP will have the fruits of that labor. Previously ASP had a bad habit of ignoring elevation maximas in the image as it thought those IP measurements were noise. Now we should have a better track record of getting measurements for the entire image.

One of the major criticisms I’m expecting from the large dump of LRO-NAC DTMs we expect to deliver next year is what is the quality of the placement of our DTMs in comparison to LOLA. Another engineer we have on staff, Oleg, has just the solution for this. He has developed an iterative closest point (ICP) program called pc_align which will be in the next release. This is built on top of ETHZ Autonomous System Lab’s libpointmatcher and has the ability to take DTMs and align them to other DTMs or LIDAR data. This enables us to create well-aligned products that have height values agreeing within tens of meters with LOLA. Our rough testing shows us having a CE90 of 4 meters against LOLA after performing our corrections.

We’re not ready for the big production run yet. One problem still plaguing our process is that we can see the CCD boundaries in our output DTMs. We believe most of this problem is due to the fact that the angle between line of sight of the left and right CCDs changes with every observation. ISIS however only has one number programmed into it, the number provided by the FK. Scott is actively developing an automated system to determine this angle and to make a custom FK for every LRO-NAC observation. The second problem we’re tracking is that areas of high slope are missing from our DEMs. This is partially because we didn’t use Bayes EM for our test runs but it also seems like our disparity filtering is overly aggressive or just wrong. We’ll get on to that. That’s all for now!

Semi-Global Matching

My co-worker Oleg Alexandrov has been working on Ames Stereo Pipeline for a while now. He’s just about touched all parts of the code. This is crystal clear when you look at our logs on Github. One of things he ribs me most about in ASP is that he doesn’t like that we advertise ASP as using up-to-date stereo correlation algorithms. “Come ‘on Man! That’s just not true!” he tells me. Depending on whom you talk to, we’re using primo 90’s research[7] or something re-hashed from the 70’s[1]. Either way, it is clear, we haven’t been tracking along with the current research in our field when it comes to integer correlation. This blog post covers the first part of my own personal research to find a new correlator algorithm that would improve ASP in terms of both runtime and quality. In this post, I’ll be reviewing an algorithm called Semi-Global Matching.

Semi-Global Matching or SGM is a method developed by Heiko Hirschmueller from the DLR. He first wrote about this in his 2005 paper[2]. He then elaborated and proposed further improvements in [3][4]. The best paper to learn the method is probably his second paper[3]. In my opinion his first paper gets side tracked in using a Mutual Information (MI) cost metric when the interesting bit is just SGM. The most exciting bit about this work is that it comes from DLR and is an algorithm they have applied to aerial and satellite mapping. I believe this is the method that was used to create the wonderful HRSC DTMs that some how managed to overcome the weird JPEG artifacts in their raw imagery.

The Algorithm

Heiko might have sped over his SGM algorithm in his first paper because he didn’t view it as being as challenging to implement when compared to the MI cost metric. SGM shares a lot in common with scanline optimization stereo, which has had a lot of prior research but now-a-days is considered a dead end. Let’s review how that worked. Also, the images used for this testing are from the Middlebury Stereo dataset. More information about this data and stereo algorithms applied to them can be found in [8][9][10][11].

Scanline optimization stereo is essentially Viterbi decoding in my mind. We evaluate along an epipolar line. In the case of a rectified image, this is along the horizontal scanline. For each pixel along that scanline we evaluate each possible disparity result. The costs of each pixel along the scanline can then be stacked into a matrix. A scanline was highlighted in the above picture. The cost of each pixel (x-direction) versus each possible disparity value (y-direction) is shown in the picture below. The solution for the disparity along this scanline is then the path through this matrix/image that has minimum costs (dark areas). We also have to include some smoothness constraint otherwise our disparity result could follow the jagged jumps in this map that don’t represent reality.

Finding the minimum path is then an application of Linear Programming. We iterate through the matrix left to right and take a rolling sum. The cost of an element in the rolling sum vector for the current pixel and disparity combination is equal to the cost for the current location plus the lowest summed cost from the set of all possible disparities for the prior pixel location. Heiko applies some additional constraints in that he penalizes the cost when ever the disparity changes. He penalizes higher for multiple disparity value transitions than he does for 1. Penality for an increment of 1 in disparity is P1 and anything greater is P2. This entire paragraph can more elegantly be described in the following equation.

Applying this forward and backward for each scanline we can solve for a disparity map. Here’s an example.

Notice there’s a lot of tearing between the scanlines. The image looks as if we had tracking error on a VCR. We could fix this by using a larger kernel size. For the above, the kernel size was 1 x 1 px. Something more unique would insure matches that are constrained between lines. Another approach would be to insure some smoothness constraint across lines as opposed to just disparity transitions. Heiko’s solution to this issue is what makes SGM what it is. He opted to instead perform scanline optimization at multiple angles and then take the summed cost vector to determine the final disparity result. Note, that even though we evaluate the scanline along an angle, the disparity is still defined as going along the epipolar line (perfectly horizontal in this case). Each line direction produces results like the following:

The sum of their cost vectors and then taking the minimum produces a beautiful result like the following:

My Implementation and Results

All of the pictures above were created with my implementation of SGM. In my version, I only evaluate 8 line directions. So my results are noisier than what’s seen in Heiko’s original paper. Despite this, the end results are pretty impressive. Here’s line up of ASP result, my SGM result, Heiko’s result, and the ground truth result. ASP performs so badly because it has a large kernel size that can’t handle the sudden jumps in depth. ASP then blurs the disparity discontinuities.

Unfortunately I must mention the bad side of this method. There are several cons the first and weakest of arguments is the required CPU time. My implementation of this method takes about 23 seconds to evaluate this with 8 paths. 16 paths like the paper would have doubled the processing time. ASP chops through this image in seconds. Heiko says he got the processing time down to 1.3 seconds in 2005. So I’m doing something horribly wrong and could improve my implementation. However speed is always an issue, some ideas to address this issue are iSGM[5] and wSGM[6]. These are hierarchical methods of SGM and fancy maps to reduce the length required to integrate paths for cost evaluation.

A bigger issue is that SGM requires an absurd amount of memory. All costs for all pixels and all possible disparity values are evaluated up front in a big tensor that has a size of W * H * D * 1 byte. We also need a copy of this tensor for evaluating paths and another to store summing for all paths. Those are two allocations of memory that are W * H * D * 2 bytes. They need to be a higher data type to avoid integer rollover artifacts. This demo set is 450 x 375 px and I evaluated it across 64 possible disparities. Thus SGM required 51 MB. That doesn’t include the memory cost of just loading the images up and allocating space for the disparity result. Imagine tackling a satellite image where we we have a disparity range of 2000 pixels.

Another pesky complaint against SGM is how to figure out what the values should be for the two penalty arguments. Heiko never mentioned what he used; likely he tuned the values for each stereo pair to get best results. However these penalty values ultimately determine how this algorithm responds to occlusion and angled surfaces. What works for World View 1 in glacier regions (low frequencies) might not necessarily apply to World View 1 in the city (square wave patterns). In practice, we would want to have tuned parameters for each instrument we work on and for each type of terrain.

The final and most harsh criticism of SGM is that it can only be applied to 1D disparity searches and the range must be defined beforehand. 1D searches work for calibrated stereo rigs such as the imagery used in this post. However it is my opinion that real data always has imperfections and finding the true disparity requires searching in the Y direction still. Examples of this are linescan cameras that have jitter but the spacecraft ephemeris isn’t sampled high enough to capture such as MOC, HiRISE, and LROC. There’s also the case were the camera isn’t perfect such as the World View cameras where there is a subpixel misregistration of all 50 CCDs. They can’t be easily corrected for because we can’t have raw imagery. ASP also doesn’t have a perfect method for epipolar rectification of linescan cameras. We have a linear approximation with our affine method but the problem is nonlinear.

SGM is still an amazing algorithm that is incredibly useful. There are ton of papers out there that find it to be perfect for their applications. Beside the incredible detail it resolves, my other favorite bit about the algorithm is that its runtime is deterministic. It depends squarely on search range and there is no worst-case path versus best-case path that we have to deal with in ASP’s binary search approach. Despite this, SGM seems to be a non-ideal match for ASP. ASP hopes to address the generic correlation problem where we don’t always trust our camera information or our data. I want something that can still handle 2D searching. In my next post I’ll show off another promising algorithm that seems to address that concern along with runtime and memory requirements. Until then, have some music.

Update: Code is now available here.

Works Cited

[1]  Barnea, D. (1972). A Class of Algorithms for Fast Digital Image Registration. IEEE Transactions on Computers.
[2]  Hirschmuller, H. (2005). Accurate and Efficient Stereo Processing by Semi Global Matching and Mutual Information. CVPR .
[3]  Hirschmuller, H. (2008). Stereo Processing by Semiglobal Matching and Mutual Information. Pattern Analysis and Machine Intelligence .
[4]  Hirschmulller, H., Buder, M., & Ernst, I. (2012). Memory Efficient Semi-Global Matching. Remote Sensing and Spatial Information Sciences .
[5]  Klette, S. H. (2012). Iterative Semi-Global Matching for Robust Driver Assistance Systems. ACCV .
[6]  Spangenberg, R., Langner, T., & Rojas, R. (2013). Weighted Semi-Global Matching and Center-Symmetric Census Transform for Robust Driver Assistance. CAIP .
[7]  Sun, C. (1997). A Fast Stereo Matching Method. Digital Image Computing: Techniques and Application.
[8] Scharstein, D., Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision .
[9] Scharstein, D., Szeliski, R. (2003). High-accuracy stereo depth maps using structured light. CVPR .
[10] Scharstein, D., Pal, C. (2007). Learning conditional random fields for stereo. CVPR .
[11] Hirschmuller, H., Scharstein, D. (2007). Evaluation of cost functions for stereo matching. CVPR .