Google’s Lens Blur Feature

Earlier today Google announced that they added a new feature to their android camera application called “Lens Blur”. From their blog post we can see that they take a series of images and solve for a multiview solution for the depth image of the first frame in the sequence. That depth image is then used to refocus the image. It is probably better described that they selectively blur an already fully focused image. This is similar to how they create bokeh in video games, however their blog states that they actually invoke the thin lens equations to get a realistic blur.

I thought this was really cool that they can solve for a depth image on a cellphone in only a few seconds. I also wanted access to that depth image so I can play with it and make 3D renderings! However that will come at another time.

Accessing the Data

Everything the GoogleCamera app needs to refocus an image is contained right inside the JPEG file recorded to /sdcard/DCIM/Camera after you perform a Lens Blur capture. Go ahead and download that IMG of the picture you took. How is this possible? It seems they use Adobe’s XMP format to store a depth image as PNG inside the header of the original JPEG. Unfortunately, I couldn’t figure out how to make usually tools, GDAL, read that.

So instead let’s do a manual method. If you open the JPEG file up in Emacs, right away you’ll see the XMP header which is human readable. Scroll down till you find the GDepth:Data section and copy everything between the quotes to a new file. Now things get weird. For whatever reason, the XMP format embeds binary containing strings of extension definition and a hash periodically through this binary PNG data you just copied. This doesn’t belong in the PNG and libpng will be very upset with you! Using Emacs again you can search this binary data for the http extension definition string and then delete everything between the \377 or OxFF upfront to the 8 bytes after the hash string. Depending on the file length you’ll have to perform this multiple times.

At this point you now have the raw PNG string! Unfortunately it is still in base64, so we need to decode it.

> base64 –D < header > header.png

Viola! You now have a depth image that you can open with any viewer.

Going back to the original XMP in the header of the source JPEG, you can find some interesting details on what these pixels mean like the following:

GFocus:BlurAtInfinity="0.036530018"
GFocus:FocalDistance="20.225327"
GFocus:FocalPointX="0.59722227"
GFocus:FocalPointY="0.5756945"
GImage:Mime="image/jpeg"
GDepth:Format="RangeInverse"
GDepth:Near="12.516363143920898"
GDepth:Far="47.468021392822266"
GDepth:Mime="image/png"

How did they do this?

There is no way for me to determine this. However looking at GoogleCamera we can see that it refers a librefocus.so which seems to contain the first half of the Lens Blur feature’s native code. Doing a symbol dump gives hints about what stereo algorithm they use.

librefocus.so:00221b68 D vtable for vision::optimization::belief_propagation::BinaryCost
librefocus.so:00221b88 D vtable for vision::optimization::belief_propagation::GridProblem
librefocus.so:00221bc0 D vtable for vision::optimization::belief_propagation::LinearTruncatedCost
librefocus.so:00221ca0 V vtable for vision::sfm::RansacSolver<vision::sfm::EssentialMatrixProblem>
librefocus.so:00221cb8 V vtable for vision::sfm::RansacSolver<vision::sfm::FundamentalMatrixProblem>
librefocus.so:00221c78 D vtable for vision::sfm::StdlibRandom
librefocus.so:00221b58 D vtable for vision::image::FixedPointPyramid
librefocus.so:00221c00 D vtable for vision::shared::EGLOffscreenContext
librefocus.so:00221800 V vtable for vision::shared::Progress
librefocus.so:00221be0 V vtable for vision::shared::GLContext
librefocus.so:00221cd0 V vtable for vision::stereo::PlaneSweep
librefocus.so:00221ce8 D vtable for vision::stereo::GPUPlaneSweep
librefocus.so:00221d20 D vtable for vision::stereo::PhotoConsistencySAD
librefocus.so:00221d00 V vtable for vision::stereo::PhotoConsistencyBase
librefocus.so:00221d60 D vtable for vision::stereo::MultithreadPlaneSweep
librefocus.so:00221ae8 V vtable for vision::features::FeatureDetectorInterface
librefocus.so:00221b00 D vtable for vision::features::fast::FastDetector
librefocus.so:00221d78 V vtable for vision::tracking::KLTPyramid
librefocus.so:00221a90 V vtable for vision::tracking::KLTPyramidFactory
librefocus.so:00221dd0 D vtable for vision::tracking::KLTFeatureDetector
librefocus.so:00221da0 D vtable for vision::tracking::GaussianPyramidFactory
librefocus.so:00221db8 D vtable for vision::tracking::LaplacianPyramidFactory

So it appears they use a KLT system that is fed to an SfM model to solve for camera extrinsics and intrinsics. Possibly they solve for the fundamental between the first sequential images, decompose out the intrinsics and then solve for the essential matrix on the remainder of the sequence. After that point, they use a GPU accelerated plane sweep stereo algorithm that apparently has some belief propagation smoothing. That’s interesting stereo choice given how old plane sweeping is and the lack of popularity in the Middlebury tests. However you can’t doubt the speed. Very cool!

PC Align

Last month we had a new release of Ames Stereo Pipeline, version 2.3! We’ve performed a lot of bug fixing and implementing new features. But my two new prized features in ASP are pc_align and lronac2mosaic. Today I’d like to only introduce pc_align, a utility for registering DEMs, LIDAR points, and ASP point clouds to each other. All you have to do is specify an input, a reference, and an approximate estimate for how bad you think the misplacement is.

Does that sound like magic? Under the hood, pc_align is performing an implementation of the iterative closest point algorithm (ICP). Specifically we are using internally libpointmatcher library from ETH. What ICP does is iteratively attempt to match every point of the input with the nearest neighbor in the reference point cloud. ICP then solves for a transform that would globally reduce the distances between the current set of matches. Then it repeats itself and performs a new round of matching input to reference and then again solves for another global step. Repeat, repeat, repeat until we no longer see any improvements in the sum of distances between matches.

This means that failure cases for PC_align are when the reference set is too coarse to describe the features seen in the input. An example would be having a HiRISE DEM and then only having a single orbit of MOLA that intersects. A single line of MOLA does nothing to constrain the DEM about the axis of the shot line. The 300 meter post spacing might also not be detailed enough to constrain a DEM that is looking at small features such as dunes on a mostly flat plane. What is required is a large feature that both the DEM and the LIDAR source can resolve.

CTX to HRSC example

Enough about how it works. Examples! I’m going to be uncreative and just process CTX and Gale Crater because they’re fast to process and easy to find. I’ve processed the CTX images P21_009149_1752_XI_04S222W, P21_009294_1752_XI_04S222W, P22_009650_1772_XI_02S222W, and P22_009716_1773_XI_02S223W using stereo options “–alignment affineepipolar –subpixel-mode 1”. This means correlation happens in 15 minutes and then triangulation takes an hour because ISIS subroutines are not thread safe. I’ve plotted these two DEMs on top of a DLR HRSC product, H1927_0000_DT4.IMG. It is important to note that this version of the DLR DEM is referenced against the Mars ellipsoid and not the Aeroid. If your data is referenced against the Aeroid, you’ll need to use dem_geoid to temporarily remove it for processing with pc_align who only understand ellipsoidal datums.

In the above picture, the two ASP created DEMs stick out like sore thumbs and are misplaced by some 200 meters. This is due to pointing information for MRO and subsequently CTX being imperfect (how much can you ask for anyway?). You could run jigsaw and that is the gold standard solution, but that takes a lot of manual effort. So instead let’s use pc_align with the commands shown next.

> pc_align --max-displacement 200 H1927_0000_DT4.tif P22-PC.tif \
        --save-transformed-source-points  -o P22_align/P22_align \
> point2dem --t_srs "+proj=sinu +lon_0=138 +x_0=0 +y_0=0 \
        +a=3396000 +b=3396000 +units=m +no_defs" \
        --nodata -32767 P22_align/P22_align-trans_source.tif

There are 3 important observations to make from the command line above. (1) We are using the max displacement option to set the upper bound of how bad we think we are, 200 meters. (2) I’m feeding the ASP PC file as the input source instead of ASP’s DEM. This is because in (3) with the save transformed source points we’ll be writing out another PC file. PC align can only export PC files so we always have to perform another round of point2dem. It is possible to run stereo -> point2dem -> PC Align -> point2dem, but it means you are unneccesarily resampling your data once. Using the PC file directly from stereo saves us from potential aliasing and removes a point2dem call.

Here’s the final result where both CTX DEMs are plotted on top of the HRSC DEM. Everything looks really good except for that left edge. This might be because the DEM was rendered with incorrect geometry and the output DEM is subtly warped from a perfect solution.

Another cool feature that pc_align author Oleg Alexandrov added was recording the beginning and ending matching errors in CSV files. They’re found with the names <prefix>-{beg,end}_errors.csv. You can load those up in QGIS and plot theirs errors to visualize that pc_align uniformly reduced matching error across the map. (Thanks Ross for showing me how to do this!)

CTX to MOLA

Quite a few MOLA shots can be found inside the CTX footprints. Above is a plot of all MOLA PEDR data for Gale crater. I was able to download this information in CSV format from the MOLA PEDR Query tool from Washington University St. Louis. Conveniently PC_align can read CSV files, just not in the format provided by this tool. PC_align is expecting the data to be in format long, lat, elevation against ellipsoid. What is provide is lat, long, elevation against aeroid, and then radius. So I had to manually edit the CSV in Excel to be in the correct order and create my elevation values by subtracting 3396190 (this number was wrong in first draft) from the radius column. The other added bit of information needed with CSV files is that you’ll need to define the datum to use. If you don’t, pc_align will assume you’re using WGS84.

> pc_align --max-displacement 200 P22-PC.tif mola.csv\
        -o P22_mola/P22_mola --datum D_MARS --save-inv-trans \
> point2dem --t_srs "+proj=sinu +lon_0=138 +x_0=0 +y_0=0 \
        +a=3396000 +b=3396000 +units=m +no_defs" \
        --nodata -32767 P22_mola/P22_mola-trans_reference.tif

Two things to notice in these commands, the inputs are backwards from before and I’m saving the inverse transform.  You can keep things in the same order as when I was aligning to HRSC,  it is just that things will run very slowly. For performance reasons, the denser source should be considered the reference and then you must request the reference to be transformed to the source. You’ll likely routinely be using this inverse form with LIDAR sources.

In the end I was able to reduce alignment error for my CTX DEMs from being over 50 meters to being less than 15 meters against MOLA and from over 100 meter to 40 meters error against HRSC. A result I’m quite happy with for a single night processing at home. You can see my final composited MOLA registered CTX DEMs on the left. The ASP team will have more information about pc_align in LPSC abstract form next year. We also hope that you try out pc_align and find it worth regular use in your research.

Update:

I goofed in the MOLA example! Using D_MARS implies a datum that is a sphere with 3396190 meter radius. I subtracted the wrong number from MOLA’s radius measurement before (the value 3396000). That probably had some effect on the registration result shown in the pictures, but this mistake is smaller than the shot spacing of MOLA. Meaning the horizontal registration is fine, but my output DTMs are 190 meters higher than they should have been. FYI, D_MOON implies a datum that is a sphere with radius 1737400 meters.