Patch Match Stereo

3 months ago during my Semi Global Matching post, I mentioned I would have a follow along post about another algorithm I was interested in. That algorithm is PatchMatch, and is in my opinion a better fit for satellite processing. It was first written for hole filling [1] and then later it was applied to stereo [2]. Unlike Ames Stereo Pipeline’s currently implemented hierarchical discrete correlator, PatchMatch easily applies to high dimensionality searching and it has a run time that corresponds to number of iterations and image size. Not image content, which I hope interprets into predictable run times. Unfortunately my current implementation is really slow and has several faults. However my hope is that eventually we’ll make it fast and that it’s predictability will enable our users to budget for the CPU cost of creating a DEM while still improving quality.

How it Works

The algorithm for PatchMatch is actually very simple. The disparity image is seeded with uniform noise. That noise is random guesses for what the disparity should be. However since every pixel is another guess, and the image is quite a bit larger than the search range we are looking over, a handful of these guesses are bound to be close to correct. We then propagate disparities with low cost to the neighboring disparities.  The cycle repeats by then adding more noise to the current propagated disparity but at half the previous amplitude. This cycle repeats until everything converges. Surprisingly, a recognizable disparity appears after the first iteration.

If you are having a hard time visualizing this, here is a video from the original authors of Patch Match Stereo.

PatchMatch can be quick in the hole filling application and probably for stereo from video. (Propagating between sequential frames can greatly help convergence.) Unfortunately it is quite slow in use to pairwise stereo. When performing stereo correlation in ASP 2.3, we group kernel evaluations by being at the same disparity value (sometimes called disparity plane in literature). This means that overlapping kernel evaluations will reuse pixel comparisons. The reuse of pixel comparisons is a huge speed boost. My implementation of PatchMatch has none of that. My implementation is also solving for a floating-point precision of disparity. While this gives me very detailed disparity maps, the downside is that my implementation spends 75% of its time performing interpolation. I think for this algorithm to become useful for researchers, I’ll need to discretize the disparities and prerender the input as super sampled to avoid repeated interpolation calculations.

I have one final statement to make on PatchMatch algorithm. In practice, PatchMatch produces results that are very similar to performing a brute force search over the entire search range where winner (lowest cost) takes all. That is different from ASP’s hierarchal search, which at each pyramid level falls into local minimums. It is only the hierarchal part of it that has any use in finding the global. What this means for PatchMatch is that we need to use a cost metric that is globally minimal. For narrow baseline stereo in a controlled lighting environment, the fast Sum of Absolute Differences (SAD) fits the bill. But in the cruel realities of satellite imagery, the only solution is Normalized Cross Correlation (NCC). Unfortunately, NCC is quite a bit slower to evaluate than SAD. Also, in cases where SAD worked, NCC performs worse (probably due to being sensitive to the calculation of mean?).

Where PatchMatch does poorly

I’ve already hit my primary concern, which is the speed of Patch Match. However I think if we view PatchMatch as replacement for both correlation and BayesEM, it already appears cost effective. A different concern is what happens when the search range’s area is larger than the area of the image being correlated. A real world example would be rasterizing a 256^2 tile for a WV2 image where the search range is 1400×100. The area of the tile is less than the search’s. The number of close valid guesses seeded by the initial uniform noise drops dramatically. It might be a good idea to then take the interest points that ASP finds and dump them in the initial random disparity guess that PatchMatch evaluates. This would insure there is a disparity to propagate from.

Previously I mentioned that PatchMatch in my experiments seems to behave as a brute force winner takes all algorithms. This means that increase the search size also means a decrease in quality because our odds of finding an outlier with less cost than the actually correct match have gone up. So maybe completely abandoning hierarchal search entirely is a bad idea.  Other ideas might be enforcing a smoothness constraint that would heavily penalize the random jumping that characterizes outliers. The enforcement of smoothness constraint was the driving force behind the power of Semi Global Matching.

Expanding the Algorithm

Since kernel evaluations in PatchMatch are individual problems and there is no shared arithmetic like there is in ASP’s stereo correlation, it makes it easy to add on some advance algorithms to PatchMatch. The original PatchMatch Stereo paper [#] mentioned adding on parameters to the disparity maps so that an affine window could be used for matching. This would improve results on slopes and I believe would be a better alternative to prior map projecting the input imagery.

Another idea mentioned in the paper was adding Adaptive Support Weights (ASW) [3] to each kernel evaluation. It adds weighting to pixels that match the color of the center of the kernel in addition to weighting central pixels more importantly than pixels at the edge. The idea is that pixels of the same color are likely to be at the same disparity. While not always correct, it definitely applies to scenarios where the top of a building is a different color than its sides. Implementations I show in my figures operate only on grayscale imagery and not color like the original paper proposed.

In practice, this additional weighting does a good job at preserving edges at depth discontinuity. This is important for cliffs and for buildings. A proposed improvement is geodesic adaptive support weights, which weighs same color pixel heavily that are connected to the central pixels. This fixes problems like a picture of blades of grass, where the blades have the same color but are actually at different disparities.

Wrapping back around to the idea that Patch Match needs to have a cost metric that is globally minimal. It might be worth while exploring different cost metric such as Mutual Information or if I remember correctly, the fast evaluating Matching by Tone Mapping (MTM) [4]. Nice side effect of this would be that correlation is completely lighting invariant and could even correlate between infrared and visible.

Additional Comparisons

Here’s a disparity image of NASA Ames Research Center captured from a Pleiades satellite. Whatever they do to pre-process their imagery, I couldn’t epipolar rectify that imagery. Search range given to both ASP and PatchMatch was [-40,-20,70,40]. Despite being noisy, I like how the PatchMatch w/ ASW preserved straight edges. The specularity of some of the roof lights on a hanger however really threw ASW for a loop.

I also processed a crop from a World View 2 image of somewhere in the McMurdo Dry Valleys. This really shot a hole in my argument that Patch Match would be completely invariant to search range. The search range was [-700,-50,820,50]. I did however have to hand tune the search range to get this excellent result from ASP. The automatic detection initially missed the top of the mountains.

Source Code

I’m being messy because this is my research code and not something production. You’ll need to understand VW and Makefiles if you want to compile and modify.

Further Research

I’m still diving through papers during my free evenings. I don’t do this at work because I have another project that is taking my soul. However there appears to be a good paper called Patch Match Filter that tries to improve speed through super pixel calculation [5]. There is also an implementation of PatchMatch that adds a smoothness constraint that performs better than the original PatchMatch paper [6]. When it came out, it was the best performing algorithm on the Middlebury dataset. However, recently another graph cut algorithm dethroned it. I myself will also just look at applying patch match as a refinement to a noisy disparity result from ASP 2.3 using a kernel size of 3. Having a small kernel still evaluates extremely quickly even if the search range is huge.

I’m sorry this post doesn’t end in a definitive conclusion. However I hope you found it interesting.


[1] Barnes, Connelly, et al. “PatchMatch: a randomized correspondence algorithm for structural image editing.” ACM Transactions on Graphics-TOG 28.3 (2009): 24.
[2] Bleyer, Michael, Christoph Rhemann, and Carsten Rother. “PatchMatch Stereo-Stereo Matching with Slanted Support Windows.” BMVC. Vol. 11. 2011.
[3] Yoon, Kuk-Jin, and In So Kweon. “Adaptive support-weight approach for correspondence search.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 28.4 (2006): 650-656.
[4] Hel-Or, Yacov, Hagit Hel-Or, and Eyal David. “Fast template matching in non-linear tone-mapped images.” Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011.
[5] Lu, Jiangbo, et al. “Patch Match Filter: Efficient Edge-Aware Filtering Meets Randomized Search for Fast Correspondence Field Estimation.” Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference. 2013.
[6] Heise, Philipp, et al. “PM-Huber: PatchMatch with Huber Regularization for Stereo Matching.”