Patch Match with Smoothness Constraints

I really wanted to implement the Heise’s 2013 paper about Patch Match with Huber regularization [1]. However that is a little too much for me to bite off, I kept getting lost in the derivation of the Huber-ROF model. So instead I chose to implement something in between Heise’s paper and Steinbrucker’s 2009 paper about large displacement optical flow [2]. I shall call what I implemented the ghetto Patch Match with squared regularization. I’m fancy.


The idea behind this implementation is that we want to minimize the following equation over the image omega.

If this equation looks familiar, it’s because you’ve seen in Semi Global Matching and it is the basis for most all “global” matching ideas. An interesting technique that I think originates for Steinbrucker’s paper is the idea that solving this directly is impossible. However if we separate this equation into two problems with convex relaxation, we can solve both halves with independent algorithms. In the equation below, both U and V are the disparity image. When theta is large, U and V should be the same disparity image.

The first half of the equation is the data term or template correlation term, which Steinbrucker recommend solving with a brute force search and where Heise instead solved with Patch Match stereo. The right half of the equation is the same as the image denoising algorithm called Total Variation Minimization, first written up in 1992 by Ruden et al (ROF) [3]. However, don’t use the 1992 implementation for solving that equation. Chambolle described a better solution using a primal dual algorithm [4]. I’d like to say I understood that paper immediately and implemented it, however I instead lifted an implementation from Julia written by Tim Holy [5] and then rewrote in C++ for speed. (Julia isn’t as fast as they claim, especially when it has to JIT its whole standard library on boot.)

Solving this new form of the equation is done with theta close to zero. Each iteration U is solved with a fixed V. Then V is solved with a fixed U. Finally in preparation for the next iteration, theta is incremented. After many iterations, it is hoped that U and V then converge on the same disparity image.

The algorithm I’ve implemented for this blog post is considerably different from Heise’s paper. I’ve parameterized my disparity with an x and y offset. This is different from Heise’s x offset and normal vector x and y measurement. I also used a sum of absolute differences while Heise used adaptive support weights. Heise’s method also used a Huber cost metric on the smoothing term while I use nothing but squared error. This all means that my implementation is much closer to Steinbrucker’s optical flow algorithm.


Here is a result I got with cones using a 3×3 kernel size.

That 3×3 kernel size should impress you. I can greatly drop the kernel size when there is a smoothness constraint.

However images of cones and perspective shots in general are not what I care about, I care about the application of this algorithm to terrain. Here it is applied to a subsampled image of a glacier imaged by World View 2.

Crap! Instead of converging to some actual disparity value, it just blurred the initial result.

Disaster Analysis

Possibly this didn’t work because I didn’t tune parameters correctly. With all global matching algorithms there a bunch of lambdas that try to bend the two cost metrics to equal footing. But is this really the answer?

I think it has something to do with the fact that Patch Match produces the same result as brute force search result, a.k.a. the global minimum across all disparity values. If you look at the global minimum result for both of these input images shown below, you’ll see that the World View imagery is exceptionally ill conditioned.

It seems that the smoothness constraint isn’t strong enough to over come the initial global minimum from correlation. Despite the noise, in the ‘cones’ example it is clear there is just high frequency noise around the final disparity image that we want. However with the World View example, there is no such clear view. That disparity image should be a simple gradient along the horizontal. ASP’s take on this region is shown right. However since the texture is self-similar across the image, this creates patches of global minimum that are the incorrect and resemble palette knife marks.

Seeding the input correlation result with ASP’s rough solution helps things to some degree. But ultimately it seems that this convex relaxation trick doesn’t insure that we converge on the correct (and what I hope is the global) minimum from the original equation. Another detail that I learned is that if Patch Match is going to converge, it will happen in the first 10 iterations.

What’s Next?

My conclusion is that having a mostly globally minimum cost metric is required. Unfortunately increasing kernel size doesn’t help much. What I think is worth investigating next is: (1) implement the affine matching and second derivative smoothing detailed in Heise’s paper; (2) Cop out and implement hierarchical Patch Match; (3) recant my sins and investigate fSGM. Eventually I’ll find something good for ASP.


[1] Heise, Philipp, et al. “PM-Huber: PatchMatch with Huber Regularization for Stereo Matching.”
[2] Steinbrucker, Frank, Thomas Pock, and Daniel Cremers. “Large displacement optical flow computation withoutwarping.” Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009.
[3] Rudin, Leonid I., Stanley Osher, and Emad Fatemi. “Nonlinear total variation based noise removal algorithms.” Physica D: Nonlinear Phenomena 60.1 (1992): 259-268.
[4] Chambolle, Antonin. “An algorithm for total variation minimization and applications.” Journal of Mathematical imaging and vision 20.1-2 (2004): 89-97.