Patch Match Stereo

3 months ago during my Semi Global Matching post, I mentioned I would have a follow along post about another algorithm I was interested in. That algorithm is PatchMatch, and is in my opinion a better fit for satellite processing. It was first written for hole filling [1] and then later it was applied to stereo [2]. Unlike Ames Stereo Pipeline’s currently implemented hierarchical discrete correlator, PatchMatch easily applies to high dimensionality searching and it has a run time that corresponds to number of iterations and image size. Not image content, which I hope interprets into predictable run times. Unfortunately my current implementation is really slow and has several faults. However my hope is that eventually we’ll make it fast and that it’s predictability will enable our users to budget for the CPU cost of creating a DEM while still improving quality.

How it Works

The algorithm for PatchMatch is actually very simple. The disparity image is seeded with uniform noise. That noise is random guesses for what the disparity should be. However since every pixel is another guess, and the image is quite a bit larger than the search range we are looking over, a handful of these guesses are bound to be close to correct. We then propagate disparities with low cost to the neighboring disparities.  The cycle repeats by then adding more noise to the current propagated disparity but at half the previous amplitude. This cycle repeats until everything converges. Surprisingly, a recognizable disparity appears after the first iteration.

If you are having a hard time visualizing this, here is a video from the original authors of Patch Match Stereo.

PatchMatch can be quick in the hole filling application and probably for stereo from video. (Propagating between sequential frames can greatly help convergence.) Unfortunately it is quite slow in use to pairwise stereo. When performing stereo correlation in ASP 2.3, we group kernel evaluations by being at the same disparity value (sometimes called disparity plane in literature). This means that overlapping kernel evaluations will reuse pixel comparisons. The reuse of pixel comparisons is a huge speed boost. My implementation of PatchMatch has none of that. My implementation is also solving for a floating-point precision of disparity. While this gives me very detailed disparity maps, the downside is that my implementation spends 75% of its time performing interpolation. I think for this algorithm to become useful for researchers, I’ll need to discretize the disparities and prerender the input as super sampled to avoid repeated interpolation calculations.

I have one final statement to make on PatchMatch algorithm. In practice, PatchMatch produces results that are very similar to performing a brute force search over the entire search range where winner (lowest cost) takes all. That is different from ASP’s hierarchal search, which at each pyramid level falls into local minimums. It is only the hierarchal part of it that has any use in finding the global. What this means for PatchMatch is that we need to use a cost metric that is globally minimal. For narrow baseline stereo in a controlled lighting environment, the fast Sum of Absolute Differences (SAD) fits the bill. But in the cruel realities of satellite imagery, the only solution is Normalized Cross Correlation (NCC). Unfortunately, NCC is quite a bit slower to evaluate than SAD. Also, in cases where SAD worked, NCC performs worse (probably due to being sensitive to the calculation of mean?).

Where PatchMatch does poorly

I’ve already hit my primary concern, which is the speed of Patch Match. However I think if we view PatchMatch as replacement for both correlation and BayesEM, it already appears cost effective. A different concern is what happens when the search range’s area is larger than the area of the image being correlated. A real world example would be rasterizing a 256^2 tile for a WV2 image where the search range is 1400×100. The area of the tile is less than the search’s. The number of close valid guesses seeded by the initial uniform noise drops dramatically. It might be a good idea to then take the interest points that ASP finds and dump them in the initial random disparity guess that PatchMatch evaluates. This would insure there is a disparity to propagate from.

Previously I mentioned that PatchMatch in my experiments seems to behave as a brute force winner takes all algorithms. This means that increase the search size also means a decrease in quality because our odds of finding an outlier with less cost than the actually correct match have gone up. So maybe completely abandoning hierarchal search entirely is a bad idea.  Other ideas might be enforcing a smoothness constraint that would heavily penalize the random jumping that characterizes outliers. The enforcement of smoothness constraint was the driving force behind the power of Semi Global Matching.

Expanding the Algorithm

Since kernel evaluations in PatchMatch are individual problems and there is no shared arithmetic like there is in ASP’s stereo correlation, it makes it easy to add on some advance algorithms to PatchMatch. The original PatchMatch Stereo paper [#] mentioned adding on parameters to the disparity maps so that an affine window could be used for matching. This would improve results on slopes and I believe would be a better alternative to prior map projecting the input imagery.

Another idea mentioned in the paper was adding Adaptive Support Weights (ASW) [3] to each kernel evaluation. It adds weighting to pixels that match the color of the center of the kernel in addition to weighting central pixels more importantly than pixels at the edge. The idea is that pixels of the same color are likely to be at the same disparity. While not always correct, it definitely applies to scenarios where the top of a building is a different color than its sides. Implementations I show in my figures operate only on grayscale imagery and not color like the original paper proposed.

In practice, this additional weighting does a good job at preserving edges at depth discontinuity. This is important for cliffs and for buildings. A proposed improvement is geodesic adaptive support weights, which weighs same color pixel heavily that are connected to the central pixels. This fixes problems like a picture of blades of grass, where the blades have the same color but are actually at different disparities.

Wrapping back around to the idea that Patch Match needs to have a cost metric that is globally minimal. It might be worth while exploring different cost metric such as Mutual Information or if I remember correctly, the fast evaluating Matching by Tone Mapping (MTM) [4]. Nice side effect of this would be that correlation is completely lighting invariant and could even correlate between infrared and visible.

Additional Comparisons

Here’s a disparity image of NASA Ames Research Center captured from a Pleiades satellite. Whatever they do to pre-process their imagery, I couldn’t epipolar rectify that imagery. Search range given to both ASP and PatchMatch was [-40,-20,70,40]. Despite being noisy, I like how the PatchMatch w/ ASW preserved straight edges. The specularity of some of the roof lights on a hanger however really threw ASW for a loop.

I also processed a crop from a World View 2 image of somewhere in the McMurdo Dry Valleys. This really shot a hole in my argument that Patch Match would be completely invariant to search range. The search range was [-700,-50,820,50]. I did however have to hand tune the search range to get this excellent result from ASP. The automatic detection initially missed the top of the mountains.

Source Code

I’m being messy because this is my research code and not something production. You’ll need to understand VW and Makefiles if you want to compile and modify.

https://github.com/zmoratto/PatchMatch

Further Research

I’m still diving through papers during my free evenings. I don’t do this at work because I have another project that is taking my soul. However there appears to be a good paper called Patch Match Filter that tries to improve speed through super pixel calculation [5]. There is also an implementation of PatchMatch that adds a smoothness constraint that performs better than the original PatchMatch paper [6]. When it came out, it was the best performing algorithm on the Middlebury dataset. However, recently another graph cut algorithm dethroned it. I myself will also just look at applying patch match as a refinement to a noisy disparity result from ASP 2.3 using a kernel size of 3. Having a small kernel still evaluates extremely quickly even if the search range is huge.

I’m sorry this post doesn’t end in a definitive conclusion. However I hope you found it interesting.

References

[1] Barnes, Connelly, et al. “PatchMatch: a randomized correspondence algorithm for structural image editing.” ACM Transactions on Graphics-TOG 28.3 (2009): 24.
[2] Bleyer, Michael, Christoph Rhemann, and Carsten Rother. “PatchMatch Stereo-Stereo Matching with Slanted Support Windows.” BMVC. Vol. 11. 2011.
[3] Yoon, Kuk-Jin, and In So Kweon. “Adaptive support-weight approach for correspondence search.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 28.4 (2006): 650-656.
[4] Hel-Or, Yacov, Hagit Hel-Or, and Eyal David. “Fast template matching in non-linear tone-mapped images.” Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011.
[5] Lu, Jiangbo, et al. “Patch Match Filter: Efficient Edge-Aware Filtering Meets Randomized Search for Fast Correspondence Field Estimation.” Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference. 2013.
[6] Heise, Philipp, et al. “PM-Huber: PatchMatch with Huber Regularization for Stereo Matching.”

Bundle Adjusting HiRISE

Last week, I showed off a method for processing LRO-NAC. Now I’m going to show an even more difficult process with HiRISE. Each observation of LRO-NAC was 2 CCDs and it made for a lot of click work. In the case of HiRISE, it has a whopping 10 CCDs. This will make bundle adjustment very tricky if we treated all 20 files as individual cameras. To get around this problem we’ll deploy a new feature in Jigsaw (the Observation option) and we’ll have to modify the HiEDR2Mosaic script that is released with Ames Stereo Pipeline.

HiRISE Preparation

I recently found out that there is a nifty UofA site from Shane Byrne that details all the stereo pairs captured by HiRISE. It is available at this link. I then stalked the user ‘mcewen’ and process a stereo pair he selected. If I was a nice guy I would process stuff selected by user ‘rbeyer’, but he images boring places. Loser! Anyways, for this demo I’ll be processing ESP_013660_1475 and ESP_013950_1475. UofA says it’s a “Gullied 35 Kilometer Diameter Impact Crater in Promethei Terra”. Whatever, I just want 3D.

At this point I would normally tell you to run HiEDR2Mosaic blindly. Unfortunately that won’t work because that script will attempt to project the outer CCDs into the RED5 or RED4’s frame of reference. Any projection is bad because it requires using the spacecraft’s ephemeris, which we don’t trust and we haven’t corrected yet. It also won’t work because noproj will drop some observation serial or whatnots. To fix this, we want to stop HiEDR2Mosaic right before it does noproj. Then we’ll perform our jigsaw and then afterwards we want to resume the process of HiEDR2Mosaic. Doing that required the modification that I checked into Github here. Just download the ‘py.in’ file and rename it to ‘py’. The ‘in’ suffix just means that our build system is going to burn the ASP version into the script at compile time.

Now the run looks like the following:

download all ESP_013660_1475 IMGs
download all ESP_013950_1475 IMGs
hiedr2mosaic.py --stop-at-no-proj ESP_013660*IMG
hiedr2mosaic.py --stop-at-no-proj ESP_013950*IMG

Bundle Adjusting

No magic here, it’s same process as usual for creating a control network. I just picked a special XSpacing of 200 meters and YSpacing of 3 km for Autoseed. This was to make sure that there would be control points on all 10 CCDs. I was guessing that a single HiRISE CCD had a swath of ~500 meters. I also made the MinimumThickness a very small number (something like .0000001) since the HiRISE CCDs are very thin strips. After Autoseed, you then proceed to manually clean up the control network and it will take a very long while. Then you should perform a couple of jigsaw runs to identify control points that have mistakes and correct them in qnet. However, for HiRISE all your jigsaw runs should use the option, observations=yes. This says that images with the same image serial should be treated as the same camera or observation. So CCDs 0 through 9 will be treated as a single camera. Without this flag, jigsaw will never converge.

You might be asking why I didn’t use this for LRO-NAC, that camera also had multiple CCDs. That’s because in the case of LRO-NAC, the CCDs are in two separate optical housing whose position and angle changes noticeably with the thermal cycling of the spacecraft. On HiRISE, all the CCDs are on the same optical plane and they all use the same optics. It is not noticeable at all. If you use the observation option for LRO-NAC, you’ll find that it is impossible to get a sigma0 value under 2 pixels.

In my last post, I tried an idea where I didn’t use any ground control points and tried to make jigsaw auto register to the default ISIS DEM, which is usually the best altimeter data available. I wanted to try that again, unfortunately HiRISE doesn’t have a big enough footprint to cover much detail in MOLA so I added 2 CTX images to the jigsaw problem. Those images were B10_013660_1473_XN_32S256W and B11_013950_1473_XN_32S256W. I made their control network separately and then used cnetmerge to add them to the HiRISE network. I then proceeded to match a bunch of the control points between the two imagers. This would have been easier if I had just processed the images from the beginning together.

Another problem I noticed from the last post was that with every jigsaw run, all control points would start from the ‘apriori’ position. I was updating their radius with cnetnewradii, but their latitude and longitude kept being locked back to their original position. I wanted this to be like ICP, so I modified my script so that after every jigsaw run, ‘apriori’ latitude and longitude would be replaced by their ‘adjusted’ solution. Below is that code.

Adjusted2Apriori.py:

#!/usr/bin/env python
# Expect ./adjusted2apriori.py <input pvl> <output pvl>
import sys

inf = open(sys.argv[1], 'r')
outf = open(sys.argv[2], 'w')

delayed_write = False
delayed_lines = []

for line in inf:
    # Search for Apriori X
    if 'AprioriX' in line:
        delayed_write = True

    if 'AdjustedX' in line:
        delayed_lines.insert(5,line.replace('Adjusted','Apriori'))
    if 'AdjustedY' in line:
        delayed_lines.insert(6,line.replace('Adjusted','Apriori'))
    if 'AdjustedZ' in line:
        delayed_lines.insert(7,line.replace('Adjusted','Apriori'))
        delayed_write = False
        for dline in delayed_lines:
            outf.write( dline )
        delayed_lines = []

    if not delayed_write:
        outf.write( line )
    else:
        if 'AprioriX' in line or 'AprioriY' in line or 'AprioriZ' in line:
            pass
        else:
            delayed_lines.append(line)

bundleadjust.sh:

#!/bin/bash

input_control=control_comb_pointreg_const.net
radius_source=/home/zmoratto/raid/isis/isis3.4.1_ubuntu1204/data/base/dems/molaMarsPlanetaryRadius0005.cub

cp $input_control control_loop.net

for i in `seq 1 50`; do
    echo Iteration $i

    # Convert point's apriori position to be adjusted position
    cnetbin2pvl from= control_loop.net to= control_loop.pvl
    ./adjusted2apriori.py control_loop.pvl control_loop2.pvl
    cnetpvl2bin from= control_loop2.pvl to= control_loop.net

    cnetnewradii cnet= control_loop.net onet= output.net model= $radius_source getlatlon= apriori
    mv output.net control_loop.net

    jigsaw fromlist=cube.lis radius=yes twist=yes cnet= control_loop.net  onet= output.net update=yes spsolve= position camsolve= velocities observations= yes maxits= 100
    mv output.net control_loop.net

    #Gathering statistics for user monitoring
    grep Sigma0: bundleout.txt
    list_length=`wc -l < bundleout.txt`
    interesting_part=`grep -n "POINTS DETAIL" bundleout.txt | awk -F ":" '{print $1}'`
    tail -n $(expr ${list_length} - ${interesting_part}) bundleout.txt | grep RADIUS --color=no | awk -F " " '{print $4}' | awk '{sum+=$1; sumsq+=$1*$1;} END {print "stdev = " sqrt(sumsq/NR - (sum/NR)**2) " meters";}'
    tail -n $(expr ${list_length} - ${interesting_part}) bundleout.txt | grep RADIUS --color=no | awk -F " " '{print $4}' | awk '{mean+=$1} END {print "mean = " mean/NR " meters";}'
done

Running jigsaw in this case was just running my script.

./bundleadjust

Unfortunately I didn’t see the standard deviation of the radius correction reducing ever. So possibly my whole idea of a jigsaw without control points is flawed, or I need a better height source. I really want to go back and redo LRO-NAC now.

After we finish our bundle adjustment, we now finish our HiEDR2Mosaic to create a single image of a HiRISE observation. This looks like the following with the modified script. Notice specifically that I’m giving the script the histitch cube files this time instead of the IMG files.

./hiedr2mosiac.py --resume-at-no-proj ESP_013660*histitch.cub
./hiedr2mosiac.py --resume-at-no-proj ESP_013950*histitch.cub

Processing in ASP

I ran the stereo command like this and I used the default stereo.options file. That just means everything is in full auto and I’m only using parabola subpixel. Thus, my output DEM looks little ugly up close.

stereo ESP_013660_1475_RED.mos_hijitreged.norm.cub ESP_013950_1475_RED.mos_hijitreged.norm.cub HiRISE/HiRISE

Then I used the following point2dem option. Notice I’m using a custom latitude of scale so that the crater of interest will be circular in the projection.

point2dem --t_srs "+proj=eqc +lat_ts=-32 +lat_0=0 +lon_0=0 +x_0=0 +y_0=0 +a=3396000 +b=3396000 +units=m +no_defs" --orthoimage HiRISE-L.tif HiRISE-PC.tif --tr 2

Results

Here are the difference maps between MOLA and a non Bundle Adjusted ASP HiRISE DEM (Raw), a Bundle Adjusted DEM with no GCPs, a Bundle Adjusted DEM with no GCPs but with the addition of the CTX imagery. We can see that the bundle adjustment helps definitely; it removes a 200-meter error.

Adding the CTX imagery definitely helped reduce the error against MOLA. However if we look at a DEM that can be created from the additional 2 CTX images we processed, we’ll see that there are still large pockets of error around the rim of the crater. When I flip back and forth between the MOLA hillshade and the CTX hillshade, I think I can definitely see a shift where the CTX DEM is too low and slightly shrunk.

So possibly my whole ‘no-gcps’ idea might be bunk. For the case of Mars, it is really easy to go get ground control points against the THEMIS mosaic or a processed HRSC observation. You’ll just want to chain THEMIS Mosaic to CTX and then to HiRISE since there is such a large resolution change. I’ll leave it as a homework assignment for someone to work out the exact commands you need to run. At least now you should understand how to bundle adjust HiRISE.

Interesting GDAL Usage Examples

Despite first appearances, GDAL is way more than an image I/O library. I routinely use it resize images, transcode images, and reproject images. But did you know that it can also draw contour lines, write kml, and orthorectify RPC images? Even if you did, I think it’s good to document here because I keep forgetting.

Render Contour Lines

Publishing DEM’s created by ASP is a tricky because it is easiest visualize height change through a color gradient. That loses its usefulness when publishing to a conference where they print in gray scale. A snazzier option would be a hill-shaded render of a DEM with contour lines plotted on top. You can see an example on right and the commands below:

gdaldem hillshade <DEM> output.tif -z 0.000012
gdal_contour -i 50 <DEM> contour
gdal_rasterize -burn 0 -l contour contour output.tif

The “-z” value on gdaldem hillshade requires some playing around to figure out what is best for your image. The “-i” value on gdal_contour is the interval between lines in units of height defined by the DEM (for ASP products, it’s meters).

KML Outline of a Cube File’s footprint

Rendering of KML outputEver wanted to provide context for your ISIS cube file image on Google Earth? These commands allow you visualize the image on right. The first two lines are ISIS commands.

footprintinit from=<ISIS Cube File>
isis2gml from=<ISIS Cube File> to=output.gml label=footprint
ogr2ogr -f KML output.kml output.gml

Orthorectify an RPC image on a DEM

This one doesn’t apply to planetary imagery, but most Earth images tend to have an RPC camera model attached to them. Download a DEM from USGS’s NED, and then with GDAL you can drape your source imagery over the DEM.

 gdalwarp -of GTiff -co TILED=yes -co COMPRESS=lzw -r cubicspline \
    -rpc -to RPC_DEM=<your DEM> <RPC image> output.tif -dstnodata 0

Getting better results with LRO-NAC in Ames Stereo Pipeline

The Lunar Reconnaissance Orbiter (LRO) has been in space now for more than 2 years. It produces extremely high-resolution images of the Moon that the world hasn’t seen the likes of in half a century. This is both a blessing and a curse for the Stereo Pipeline. It’s high-resolution nature means that we can get meter level accuracy easily. The downside is that even the smallest hills in LRO-NAC imagery can cause huge disparities. Large disparities mean long processing times.

LRONAC Example Raw Stereo Input Image

Take for example this pair of LRO-NAC imagery of the Apollo 15 Landing site. These images are observations M111571816 and M111578606. For simplicity, let’s only discuss the left frame (or LE files). The bright impact crater, seen at the top of the frame, moves right 2500 pixels between frames. However, the hills at the bottom of the frame move left 500 pixels. This means that in the horizontal direction only, the Stereo Pipeline must search ~3000 pixels! This is not only slow, but makes the correlation process very susceptible to matching noise. When we correlate those images directly, the whole process fails because the integer correlator couldn’t tell the difference between noise and the correct solution.

LRONAC Images Mapprojected to LOLAIf we map project the images, the results get a little better. The images below were created by running ISIS3’s spiceinit and cam2map. Now the disparity is more reasonable and ASP can easily correlate the top of the frame. However the search at the bottom of the frame is still pretty extreme. There is about a ~500 pixel shift in the hills. This is less than before; unfortunately ASP still correlates to noise.  The automatic settings recommended a search of 1400 by 100 pixels. This is larger than probably what is required since the automatic settings pads its numbers to catch parts of the image it couldn’t get readings from.

The big question is, why didn’t the disparity completely disappear after map projection? This is a fault of the 3D data used to map project. By default, ISIS uses LOLA data to map project by. At the equator the LOLA data is very sparse and doesn’t do a good job of modeling the mountains and hills. Luckily, a new 3D data source has become available, the LRO-WAC Global DTM created by the DLR. This is a dense 100 meters/px 3D model and would make a much better product to map project imagery on. It’s still not perfect since it is about 2 magnitudes lower resolution than our LRO-NAC imagery, but it is still better than LOLA data and will help reduce our disparity search range.

The first trick to perform is getting ISIS to map project on new DEM data. This is not entirely easy and requires some planning.

Download and prepare a WAC DTM tile

The example images I’m using for this article occur at 3.64E, 26N. So I’ve only downloaded the tile that covers the north most and first 90 degrees of the global DTM. This link is where you can download WAC DTM tiles yourself. You then need to convert it to a cube format that ISIS can understand. This involves a file conversion, a conversion to radial measurements, and then finally attaching some statistical information on the DTM. Here are the commands I used. I strongly recommend reading the ISIS documentation to learn exactly what the commands are doing.

> <download WAC_GLD100_E300N0450_100M.IMG>
> pds2isis from=WAC_GLD100_E300N0450_100M.IMG to=WAC_GLD100_E300N0450_100M.cub
> algebra from=WAC_GLD100_E300N0450_100M.cub to=WAC_GLD100_E300N0450_100M.lvl1.cub operator=unary A=1 C=1737400
> demprep from=WAC_GLD100_E300N0450_100M.lvl1.cub to=WAC_GLD100_E300N0450_100M.lvl2.cub

I’m using the algebra command to convert the values in the WAC DTM tile from their offset-from-datum value (where zero equals the datum radius of 1,737.4 km) to radius values (where zero equals the center of the Moon), which are the kind of values that ISIS needs for a shape model that it will project images on to. There is probably a better way of doing this by just editing a metadata offset somewhere. Unfortunately I don’t know how. The demprep adds a statistics table. You can open the result up in qview to check that the positioning is correct.

Attaching and project against a custom shape model

When users ‘spiceinit’ a file, they are attaching a lot of metadata to the image. Everyone knows that they are adding the camera model and the spacecraft’s ephemeris data. Yet they are also attaching what ISIS calls a shapemodel, this is actually a DEM/DTM. Map projecting against a new shape models requires redoing the ‘spiceinit’ step. Here’s an example assuming you’ve already completed the ‘lronac2isis’.

> spiceinit from=M111571816LE.cub shape=USER model=WAC_GLD100_E300N0450_100M.lvl2.cub

At that point you can run cam2map yourself. I strongly recommend just using the cam2map4stereo.py script provided by ASP to map project your input for the stereo session. On the left is an example of LRO-NAC imagery draped over the WAC DTM. The differences are very fine. You might want to try toggling back and forth between the image of the WAC projected imagery and the LOLA projected imagery.

Stereo Pipeline Results

Once you’ve map projected against the LRO-WAC DTM, you’ve created images that are much better aligned to each other. If you run everything in full automatic mode, it will attempt a search of about 600 px by 100 px. That’s half the search range of the previous cam2map results against LOLA! Thus, this means faster processing time and less required memory during the integer correlation step. On the right is my final result. Oddly, this image exposes a problem I’ve never seen before. The top half of the image exhibits ‘stair step’ like features. Anyways, this is one interesting idea for speeding up things and make LRO-NAC processing faster. I hope it allows more people to process these images for 3D.