Google’s Lens Blur Feature

Earlier today Google announced that they added a new feature to their android camera application called “Lens Blur”. From their blog post we can see that they take a series of images and solve for a multiview solution for the depth image of the first frame in the sequence. That depth image is then used to refocus the image. It is probably better described that they selectively blur an already fully focused image. This is similar to how they create bokeh in video games, however their blog states that they actually invoke the thin lens equations to get a realistic blur.

I thought this was really cool that they can solve for a depth image on a cellphone in only a few seconds. I also wanted access to that depth image so I can play with it and make 3D renderings! However that will come at another time.

Accessing the Data

Everything the GoogleCamera app needs to refocus an image is contained right inside the JPEG file recorded to /sdcard/DCIM/Camera after you perform a Lens Blur capture. Go ahead and download that IMG of the picture you took. How is this possible? It seems they use Adobe’s XMP format to store a depth image as PNG inside the header of the original JPEG. Unfortunately, I couldn’t figure out how to make usually tools, GDAL, read that.

So instead let’s do a manual method. If you open the JPEG file up in Emacs, right away you’ll see the XMP header which is human readable. Scroll down till you find the GDepth:Data section and copy everything between the quotes to a new file. Now things get weird. For whatever reason, the XMP format embeds binary containing strings of extension definition and a hash periodically through this binary PNG data you just copied. This doesn’t belong in the PNG and libpng will be very upset with you! Using Emacs again you can search this binary data for the http extension definition string and then delete everything between the \377 or OxFF upfront to the 8 bytes after the hash string. Depending on the file length you’ll have to perform this multiple times.

At this point you now have the raw PNG string! Unfortunately it is still in base64, so we need to decode it.

> base64 –D < header > header.png

Viola! You now have a depth image that you can open with any viewer.

Going back to the original XMP in the header of the source JPEG, you can find some interesting details on what these pixels mean like the following:


How did they do this?

There is no way for me to determine this. However looking at GoogleCamera we can see that it refers a which seems to contain the first half of the Lens Blur feature’s native code. Doing a symbol dump gives hints about what stereo algorithm they use. D vtable for vision::optimization::belief_propagation::BinaryCost D vtable for vision::optimization::belief_propagation::GridProblem D vtable for vision::optimization::belief_propagation::LinearTruncatedCost V vtable for vision::sfm::RansacSolver<vision::sfm::EssentialMatrixProblem> V vtable for vision::sfm::RansacSolver<vision::sfm::FundamentalMatrixProblem> D vtable for vision::sfm::StdlibRandom D vtable for vision::image::FixedPointPyramid D vtable for vision::shared::EGLOffscreenContext V vtable for vision::shared::Progress V vtable for vision::shared::GLContext V vtable for vision::stereo::PlaneSweep D vtable for vision::stereo::GPUPlaneSweep D vtable for vision::stereo::PhotoConsistencySAD V vtable for vision::stereo::PhotoConsistencyBase D vtable for vision::stereo::MultithreadPlaneSweep V vtable for vision::features::FeatureDetectorInterface D vtable for vision::features::fast::FastDetector V vtable for vision::tracking::KLTPyramid V vtable for vision::tracking::KLTPyramidFactory D vtable for vision::tracking::KLTFeatureDetector D vtable for vision::tracking::GaussianPyramidFactory D vtable for vision::tracking::LaplacianPyramidFactory

So it appears they use a KLT system that is fed to an SfM model to solve for camera extrinsics and intrinsics. Possibly they solve for the fundamental between the first sequential images, decompose out the intrinsics and then solve for the essential matrix on the remainder of the sequence. After that point, they use a GPU accelerated plane sweep stereo algorithm that apparently has some belief propagation smoothing. That’s interesting stereo choice given how old plane sweeping is and the lack of popularity in the Middlebury tests. However you can’t doubt the speed. Very cool!