Thoughts on improving ASP Stereo

Ames Stereo Pipeline’s (ASP) current integer correlator leaves a bit to be desired. Currently it does poorly in scenes with aggressively changing slopes. It is also a coin flip if it finishes in an hour or several days. So I’ve been working on researching a new correlator by reading, implementing, and applying to satellite imagery a select few of the top performers from the Middlebury stereo competition. I started this a long time ago with PatchMatch and I never gave a good conclusion. Now I will summarize by experiences and give a short introduction into the current solution I’m pursuing.

Algorithm Shoot Out!

Semi Global Matching: [1] This is a world recognized well performing stereo algorithm. I don’t need to say its graces. The cons in my opinion are that it uses a lot of memory and that it is only applicable to 1-D searching. For ASP we like to have 2-D searching solution, or optical flow, to handle flaws in the user’s input data and because some users have actual used us for the creation of velocity maps. We might have been to get around the inaccuracies in our users data and the horrors of linescan cameras by calculating a local epipolar vector for each pixel after a bundle adjustment. But I believe we wouldn’t catch the vertical CCD shifts and jitter seen in HiRISE and World View satellites. As for the memory problem, there have been derivative SGM algorithms to fix this problem, but I didn’t evaluate them.

PatchMatch: [2] I really love the idea of starting with a uniform noise guess for the disparity and then propagating lowest cost scores to the neighbors. There were a couple downsides to this algorithm for satellite processing. 1. The cost metric of absolute differencing intensities and gradients performed much worse than an NCC cost metric in the arctic. 2. The run time was horrible because each pixel evaluation didn’t reuse previous comparison used by neighboring pixels. 3. Their slanted window needed to be adapted to support slants in the vertical direction as well as the horizontal for our optical flow demands. I couldn’t find a formulation that would stop the algorithm from cheating by defining the window as 90 degrees from the image geometry. In other words, the PatchMatch algorithm kept finding out that the correlation score was minimal if you define the kernel as having no area.

Despite all of this, a plain jane implementation of PatchMatch using NCC and non-slanted windows performs the same as a brute force dense evaluation of a template window across all disparity values. This also means that places were brute force search fails, so would PatchMatch. But, maybe for extremely large search ranges, PatchMatch might be worth its processing time. I will keep this in the back of mind forever.

PatchMatch with Huber Regularization: [3] This is a neat idea that is built on top of Steinbruecker and Thomas Pock’s “Large Displacement Optical Flow Computation without Warping” [4]. (Seriously though, Thomas Pock hit a gold mine with lets apply a regularity term to everything in computer vision and show an improvement.) I eventually learned how to implement primal dual convex optimization using Handa’s guide [5]. I realize now that everything I need to know is in Heise’s paper [3], but it took me a long time to understand that. But I never implement exactly what the paper described. They wanted a smoothness constraint applied to both the disparity and the normal vector used to define the correlation kernel. Since I couldn’t define a slanted correlation kernel that worked both in horizontal and vertical directions as seen in PatchMatch, I just dropped this feature. Meaning I only implemented a smoothness constraint with the disparity. Implementing this becomes a parameter tuning hell. I could sometimes get this algorithm to produce a reasonable looking disparity. But if I ran it for a few more iterations, it would then proceed to turn slopes into constant disparity values until it hit a color gradient in the input image. So it became a very difficult question for me of, at what point in the iterations do I get a good result? How do I know if this pretty result is actually a valid measurement and not something the smoothness constraint glued together because it managed to out weight the correlation metric?

In the image I provided above, you can see a slight clustering or stair-casing of the disparity as the smoothness constraint wants disparity values to match their neighbors. Also, random noise spikes would appear and neither the total variance (TV) term or the data term would remove them. They are stable minimas. I wonder if a TVL1 smoothnss term would be better than a TVHuber.

As Rigid As Possible Stereo under Second Order Smoothness Priors: [6] This paper again repeats the idea seen in PatchMatch Huber regularization of having a data term, a regularization term, and theta that with increasing iterations forces the two terms to converge. What I thought was interesting here was their data term. Instead of matching templates between the images for each pixel, instead break the image into quadratic surfaces and then refine the quadratic surfaces. This is incredibly fast evaluating even when using a derivative free Nelder Mead simplex algorithm. Like several orders of magnitude faster. Unfortunately this algorithm has several cons again. 1. They wanted to use the cost metric seen in PatchMatch that again doesn’t work for the satellite imagery of the arctic that I have been evaluating. 2. The data term is incredibly sensitive to its initial seed. If you can’t find a surface that is close to the correct result, the Nelder Mead algorithm will walk away. 3. This algorithm with a smoothness prior is again a parameter tuning hell. I’m not sure that what I tune up for my images will work equally well for the planetary scientists as well as the polar scientists.

Fast Cost-Volume Filtering for Visual Correspondence and Beyond: [7] This is an improvement algorithm to the KAIST paper about Adaptive Support Weights. [8] (Hooray KAIST! Send us more of your grad students.)  They say hey, this is actually a bilateral filter that Yoon is talking about.  They also recently read a paper about performing a fast approximate of the bilateral filter by using a ‘guided’ filter. In the end this is similar to a brute force search except now there is fancy per pixel weighting for each kernel based on image color. This algorithm is easy to implement but fails to scale to larger search regions just like brute force search. Yes this can be applied in a pyramidal fashion but I think in the next section that I’ve hit on a better algorithm. I wouldn’t count this algorithm out all together though. I think it has benefit as a refinement algorithm to the disparity, specifically in cases of urban environments with hard disparity transitions.

What am I pursuing now?

Our users have long known that they could get better results in ASP by first map projecting their input imagery on a prior DEM source like SRTM or MOLA. This reduces the search range. But it also warps the input imagery so that from the perspective of the correlator, the imagery doesn’t have slopes anymore. The downside is that this requires a lot of work on the behalf of the user. They must run a bunch more commands and must also find a prior elevation source. This prior elevation source may or may not correctly register with their new satellite imagery.

My coworker Oleg hit upon an idea of instead using a lower resolution disparity, smoothing it, and then using that disparity to warp the right image to the left before running the final correlation. It’s like map projecting, except with out the maps, camera models, and prior existing elevation source. I’ve been playing with it and made a pyramidal version of this idea. Each layer of the pyramid takes the previous disparity, smooths it, and the warps the right image to the left. Here is an example of a disparity produced with this method up against current ASP correlator’s result. I have single thread rough prototype variant and an in-progress parallel variant I’m working on.

Looks pretty good right? There are some blemishes still that I hope to correct. Surprisingly the parallel implementation of this iterated warping correlator is 2x faster than our current pyramid correlator. Another surprising feature is that the runtime for this mapping algorithm is mostly constant despite the image content. For consecutive pyramid levels, we’ll always be searching a fixed square region, whereas the original ASP pyramid correlator will need to continually adapt to terrain it sees. Once I finish tuning this new algorithm I’ll write another post on exactly why this is the case. There is also a bit of a black art for smoothing the disparity that is used for remapping the right image.

Conclusion

I’m pretty excited again about finding a better correlator for ASP. I still have concerns about how this iterative mapping algorithm will handle occlusions. I also found out that our idea is not completely new. My friend Randy Sargent has been peddling this idea for a while [9]. He even implemented it for the Microscopic Imager (MI) on board the Mars Exploration Rovers. I didn’t even know that software existed! But they used homography matrices, while our ‘new’ idea is using a continuous function. In the end, I hope some of you find my diving into stereo research papers useful. I learned about a lot of cool ideas. Unfortunately very few of them scale to satellite imagery.

Reference

[1] Hirschmuller, Heiko. “Accurate and efficient stereo processing by semi-global matching and mutual information.” Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 2. IEEE, 2005.
[2] Bleyer, Michael, Christoph Rhemann, and Carsten Rother. “PatchMatch Stereo-Stereo Matching with Slanted Support Windows.” BMVC. Vol. 11. 2011.
[3] Heise, Philipp, et al. “PM-Huber: PatchMatch with Huber Regularization for Stereo Matching.” Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013.
[4] Steinbrucker, Frank, Thomas Pock, and Daniel Cremers. “Large displacement optical flow computation withoutwarping.” Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009.
[5] Handa, Ankur, et al. Applications of Legendre-Fenchel transformation to computer vision problems. Vol. 45. Tech. Rep. DTR11-7, Department of Computing at Imperial College London, 2011.
[6] Zhang, Chi, et al. “As-Rigid-As-Possible Stereo under Second Order Smoothness Priors.” Computer Vision–ECCV 2014. Springer International Publishing, 2014. 112-126.
[7] Rhemann, Christoph, et al. “Fast cost-volume filtering for visual correspondence and beyond.” Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.
[8] Yoon, Kuk-Jin, and In So Kweon. “Adaptive support-weight approach for correspondence search.” IEEE Transactions on Pattern Analysis and Machine Intelligence 28.4 (2006): 650-656.
[9] Sargent, Randy, et al. “The Ames MER microscopic imager toolkit.” Aerospace Conference, 2005 IEEE. IEEE, 2005.

Mimicking Zero Gravity

I know I promised a different blog post than this one. However I’m a bit crammed for time lately. However I do have something quick to show …

This summer, my interns Abe Shultz and Benjamin Reinhardt rejuvenated an old gantry system that is used for mimicking zero gravity for small spacecraft. This is part of my work with the NASA SPHERES program. In this video I demonstrate the gantry responding to external forces applied on a wooden analog for SPHERE. In the future the real SPHERE will be loaded up and will use CO2 thrusters to move itself about the room with the gantry responding accordingly.

This test apparatus is one facility I plan to use to test out a new visual navigation system that my coworkers and I are developing. What I’m looking forward to is the vomit comet flight we have scheduled towards the end of the year. This is why I’ve left some of the software development of ASP with others.

Satellite Epipolar Rectification

Whenever I meet anyone remotely familiar with stereo algorithms, they tend to ask me why ASP doesn’t have an epipolar rectification step.

“It does, but for pinhole camera models.”
“Why not satellites?”
“Well, they don’t really have epipolar lines but instead curves! Making rectification very difficult and not reversible.”

This has been my default response the last few years. But it has been a long time since I first tried to understand this topic. Back then I thought the solution would require some difficult to implement polynomial mapping or rubbersheet transforms [1]. I’ve been with NASA a while now, maybe this time I can better grasp what is required.

We’ll I’ve gone back through the papers. Surprisingly epipolar rectification is still a popular subject to get published in the photogrammetric and remote sensing world. The new ones share a lot of my complaints in that proper resampling using the rigorous camera model is hard. However they show that for the height ranges that stereo modeling is interested in, the epipolar lines can be considered to be linear. Even more interesting, the papers note that all the epipolar lines are parallel [2][3]. They then attempt to solve for an affine transform for both the left and the right images that reduces Y disparity and minimizes image distortion. Solving for the best affine that doesn’t warp the image too much is the primary trick to this task and a few master’s theses have been written on the topic.

Some of the students like to introduce a new camera model, like the Parallel Projection model. I’d like to connect the dots and say that this is not a novel idea at all. In the computer science community this is known as the affine camera model. My favorite book has a whole chapter on the topic [4]. Hartley even goes into a few pages about solving for the affine fundamental matrix and then gives a teaser image about it’s application to stereo rectification for high altitude aerial imagery.

Math

Hartley never goes into how to do affine epipolar rectification in his book. I’d like to document my attempt at this topic as I believe it is a lot simpler than the remote sensing papers make it out to be. It is also simpler than some of the computer science papers make it to be [5].

The general plan of affine epipolar image rectification is best described in 3 steps.

1.)  Identify the slope of the epipolar lines and rotate them so they are horizontal. I’ll be using image correspondences to solve for the affine fundamental matrix, which will get me the epipolar lines and then rotation matrices.
2.)  Align and scale the Y axis of the right image to match the left image.
3.)  Align, scale, and solve for an appropriate skew to minimize the amount of searching a stereo algorithm would have to do in the X axis.

Solving for the fundamental matrix is an algorithm I took from [4]. An affine fundamental matrix differs from a regular fundamental matrix in the sense that its epipoles are at infinity. This makes the top left 2×2 area of the matrix zero. Only the remaining 5 elements contain any information. Solving for those elements we force the constraint that:

When you expand that equation out, you’ll see that the bottom right corner element is not multiplied by any of the image correspondence measurements. So we’ll solve for the other elements first and then we could back substitute for ‘e’ later. But I’m never going to use that element … so let’s forget ‘e’ exists. Solving for the other parameters is achieved by finding the eigen vector with the smallest singular value.

The last row and column of the affine fundamental matrix represent the epipoles of the left and right image.

Let’s use them to solve for the rotation matrices to flatten those epipolar lines. Here was my derivation where I attempted to avoid any trigonometric functions. In hindsight, this makes the algorithm harder to read. At least we avoided a CPU table lookup!

Now we are finally at step 2 where we need to find scaling and offset for the Y axis. This can be performed in a single blow using LAPACK least_squares routine if we just arrange the problem correctly.

Step 3 is solving for the scaling, skew, and offset for X. This too can be performed in a single blow using least_squares.

Again we’ll apply the X scaling and offset only to the right affine transform. However I thought that skew should be applied equally to both images so to minimize distortion. Splitting the skew evenly between both image transforms and how to apply the X scaling is achieved via:

Application to Mars Imagery

I wrote a C++ implementation of the above algorithm and applied it to the example stereo pairs ASP supplies for MOC, CTX, and HiRISE missions. I collect image correspondences for each stereo pair to work out what the search range needs to be in order to create a DEM. The area of this search range is proportional to how much work the stereo correlator would need to do. (Please ignore that in ASP, we use a hierarchal method that provides some optimization making processing time not linearly related to area of the search range.) I then compared the expected search range with alignment options of NONE, HOMOGRAPHY, and the new algorithm. Below are the results.

MOC CTX HiRISE
Image Size 1024×8064 5000×30720 20000×80000
NONE's Range [-389 -729]x[-104 604] [-488 -1320]x[468 -1295] [-15532 2112]x[18325 8145]
NONE's Area 379905 23900 204259281
HOMOGRAPHY's Range [-7 -2]x[7 3] [-56 -10]x[57 11] [-16176 -342]x[18164 402]
HOMOGRAPHY's Area 70 2373 25548960
EPIPOLE's Range [-7 -2]x[7 3] [-58 -2]x[57 1] [-16228 -53]x[18128 287]
EPIPOLE's Area 70 345 11681040

In the MOC case, homography alignment produced the same results as epipolar alignement. But once the images started to get bigger, it becomes clear that epipolar alignment is a winning strategy. The next test to be performed is to see if the stereo correlator still works on this imagery. We might be introducing skew or scaling errors that the correlator can’t recover from. I have yet to finish that part though.

Another fun note, epipolar rectification is how you produce anaglyphs! Here are few examples of what I was able to do with GIMP. I followed the guide linked here.

So … it looks like I need to put this in ASP 2.2 now. That’s what I’m going to be doing after this blog post.

References:

[1] Oh, Jaehong. Novel Approach to Epipolar Resampling of HRSI and Satellite Stereo Imagery-based Georeferencing of Aerial Images. Diss. The Ohio State University, 2011.
[2] Morgan, Michel, et al. “Epipolar resampling of space-borne linear array scanner scenes using parallel projection.” Photogrammetric engineering and remote sensing 72.11 (2006): 1255-1263.
[3] Wang, Mi, Fen Hu, and Jonathan Li. “Epipolar resampling of linear pushbroom satellite imagery by a new epipolarity model.” ISPRS Journal of Photogrammetry and Remote Sensing 66.3 (2011): 347-355.
[4] Hartley, Richard, and Andrew Zisserman. Multiple view geometry in computer vision. Vol. 2. Cambridge, 2000.
[5] Liansheng, Sui, Zhang Jiulong, and Cui Duwu. “Image rectification using affine epipolar geometric constraint.” Computer Science and Computational Technology, 2008. ISCSCT’08. International Symposium on. Vol. 2. IEEE, 2008.

Winging a DEM for a mission using World View 1

The group I work for at NASA has a big robot that likes to drive in a quarry at speed. Doing this is risky as we could easily put the robot in a position to hurt itself or hurt others. One of things we do to mitigate the risk is by having a prior DEM of the test area. The path planning software can then use the DEM to determine where it is and what terrain is too difficult to cross.

Since ASP recently gained the ability to process Digital Globe and GeoEye imagery (more about that in a later post), I was given a request to make a DEM from some World View 1 imagery they purchased. The location was Basalt Hills, a quarry at the south end of the San Luis Reservoir. To process this imagery with any speed, it is required to map project the imagery on some prior DEM. My choices were SRTM or NED. In my runs, both DEMs have problems. SRTM has holes in the middle of it that needed to be filled so ASP would work correctly. NED had linear jumps in it that ASP couldn’t entirely reverse in its math.

I ended up using SRTM as a seed to create my final DEM of the quarry. If you haven’t seen this, the process looks like the following commands below in ASP 2.0+. What’s happening is that ASP uses an RPC map projection to overlay the imagery over SRTM. When it comes time for ASP to triangulate, it reverses math it used to map project, and then in the case of Digital Globe it will triangulate using the full camera model. Another thing worth noting is that ASP needs control over how the interpolation is performed when doing RPC map projection. This forces us not to use the GDAL utilities during this step and instead use our own custom utility.

parallel rpc_mapproject --tr 0.5 \
      --t_srs'"+proj=utm +zone=10 +datum=WGS84 +units=m +no_defs"' \
      filled_srtm_dem.tif {} {.}.XML {.}.srtm.crop.tif ::: left.TIF right.TIF
stereo left.srtm.crop.tif right.srtm.crop.tif left.XML right.XML \
      r1c1_srtm_crop/r1c1_srtm_crop filled_srtm_dem.tif

Afterwards we got a pretty spiffy result that definitely shows more detail than the prior DEM sources. Unfortunately the result was shifted from the NED DEM source that my crew had previously been using. This ideally would be fixed by bundle adjusting the World View camera locations. It was clearly needed as most of our projected rays only came within 3 meters of each other. Unfortunately ASP doesn’t have that implemented.

EDIT: If I had paid closer attention to my data I would have noticed that a large part of the differences I was seeing between my DEM and USGS’s NED was because the NED data uses a vertical datum. My ASP DEM are referenced against the WGS84 ellipsoid. NED data is referenced against WGS84 plus the NAVD88. This would account for a large part of the 30 meter error I was seeing. (11.19.12)

My “I’m-single-with-nothing-going-on-tonight” solution was the Point Cloud Library. It has promising iterative closest point (ICP) implementations inside it and will eventually have the normal distribution transform algorithm in it. It also has the benefit of having its libraries designed with some forethought compared to the hideous symbol mess that is OpenCV.

PCL's pcd_viewer looking at the quarry.

I achieved ICP with PCL by converted my PC (point cloud) file from ASP into a PCL PCD file [1]. I also converted the NED DEM into a PCD file [2]. I then subsampled my ASP point cloud file to something more manageable by PCL’s all-in-memory tactics [3]. Then I performed ICP to solve for the translation offset I had between the two clouds [4]. My offset ended up being about a 40 meter shift in the north and vertical direction. I then applied this translation back to the ASP PC file [5] so that the DEM and DRG could be re-rendered together using point2dem like normal.

I wrote this code in the middle of the night using a lot of C++ because I’m that guy. Here’s the code I used just for reference in the event that it might help someone. Likely some of the stuff I performed could have been done in Python using GDAL.

1. convert_pc_to_pcd.cc
2. convert_dem_to_pcd.cc
3. pcl_random_subsample.cc
4. pcl_icp_align.cc
5. apply_pc_offset.cc

After rendering a new DEM of the shifted point cloud, I used MDenoise to clean up the DEM a bit. This tool is well documented at its own site (http://personalpages.manchester.ac.uk/staff/neil.mitchell/mdenoise/).

I’ve also been learning some QGIS. Here are some screen shots where you can see the improved difference map between NED and my result after ICP. Generally this whole process was very easy. It leaves me to believe that with some polish this could make a nice automated way to build DEMs and register them against a trusted source. Ideally bundle adjustment would be performed, but I have a hunch that the satellite positioning for Earth targets is so good that very little shape distortion has happen in our DEM triangulations. I hope this has been of interest to some of you out there!

Difference map between the USGS NED map and ASP's WV01 result.