Tango take the Wheel

Johnny Lee, my boss a couple links up, recently showed off at Google I/O a reel of recent research improvements on Tango. Possibly not as exciting as the other stuff in the video, but at the 1:19 mark in the video you’ll see some research by me and my coworkers to see how well Tango works on a car. As you can see it works really well and we even drove it 8 km in downtown San Francisco through tourist infested areas. Surprisingly or not, the mass number of people or the sun didn’t manage to blind out our tracking.

How did we do it? Well we took Tango phone and quick clamped it to my car. Seriously. Here’s a picture of a Lenovo Phab 2 Pro and a Asus Zenphone AR attach to my car and me in my driving glasses. We ran the current release of Tango and did motion tracking only and it just worked! … As long as you shut off the safeties that reset tracking once you exceed a certain velocity. Unfortunately you users outside of Google can’t access this ability as in a way these velocity restrictions are our own COCOM Limits.

Also, this is something really only achievable with the new commercial phones. The original Tango Development Kit didn’t include good IMU intrinsics calibration. The newly produced cellphones at the factory will solve for IMU scale, bias, and misalignment. At the end of each factory line, a worker places the phone in a robot for a calibration dance. Having this calibration is required for getting the low drift rates. Remember our IMUs are cheap 50 cent things and have a lot of wonkiness that the filter to needs to sort out.

3D Modelling with a R/C Quadcopter

Aarcon Racicot talks at FOSS4G 2014 about using his DJI Phantom 2 to model his community in several software packages including Agisoft Photoscan and Visual SFM.

* I want to add that Visual SFM can be run from the command line for batch processing and I believe that Visual SFM is an improvement over Bundler SFM.

Google’s Lens Blur Feature

Earlier today Google announced that they added a new feature to their android camera application called “Lens Blur”. From their blog post we can see that they take a series of images and solve for a multiview solution for the depth image of the first frame in the sequence. That depth image is then used to refocus the image. It is probably better described that they selectively blur an already fully focused image. This is similar to how they create bokeh in video games, however their blog states that they actually invoke the thin lens equations to get a realistic blur.

I thought this was really cool that they can solve for a depth image on a cellphone in only a few seconds. I also wanted access to that depth image so I can play with it and make 3D renderings! However that will come at another time.

Accessing the Data

Everything the GoogleCamera app needs to refocus an image is contained right inside the JPEG file recorded to /sdcard/DCIM/Camera after you perform a Lens Blur capture. Go ahead and download that IMG of the picture you took. How is this possible? It seems they use Adobe’s XMP format to store a depth image as PNG inside the header of the original JPEG. Unfortunately, I couldn’t figure out how to make usually tools, GDAL, read that.

So instead let’s do a manual method. If you open the JPEG file up in Emacs, right away you’ll see the XMP header which is human readable. Scroll down till you find the GDepth:Data section and copy everything between the quotes to a new file. Now things get weird. For whatever reason, the XMP format embeds binary containing strings of extension definition and a hash periodically through this binary PNG data you just copied. This doesn’t belong in the PNG and libpng will be very upset with you! Using Emacs again you can search this binary data for the http extension definition string and then delete everything between the \377 or OxFF upfront to the 8 bytes after the hash string. Depending on the file length you’ll have to perform this multiple times.

At this point you now have the raw PNG string! Unfortunately it is still in base64, so we need to decode it.

> base64 –D < header > header.png

Viola! You now have a depth image that you can open with any viewer.

Going back to the original XMP in the header of the source JPEG, you can find some interesting details on what these pixels mean like the following:

GFocus:BlurAtInfinity="0.036530018"
GFocus:FocalDistance="20.225327"
GFocus:FocalPointX="0.59722227"
GFocus:FocalPointY="0.5756945"
GImage:Mime="image/jpeg"
GDepth:Format="RangeInverse"
GDepth:Near="12.516363143920898"
GDepth:Far="47.468021392822266"
GDepth:Mime="image/png"

How did they do this?

There is no way for me to determine this. However looking at GoogleCamera we can see that it refers a librefocus.so which seems to contain the first half of the Lens Blur feature’s native code. Doing a symbol dump gives hints about what stereo algorithm they use.

librefocus.so:00221b68 D vtable for vision::optimization::belief_propagation::BinaryCost
librefocus.so:00221b88 D vtable for vision::optimization::belief_propagation::GridProblem
librefocus.so:00221bc0 D vtable for vision::optimization::belief_propagation::LinearTruncatedCost
librefocus.so:00221ca0 V vtable for vision::sfm::RansacSolver<vision::sfm::EssentialMatrixProblem>
librefocus.so:00221cb8 V vtable for vision::sfm::RansacSolver<vision::sfm::FundamentalMatrixProblem>
librefocus.so:00221c78 D vtable for vision::sfm::StdlibRandom
librefocus.so:00221b58 D vtable for vision::image::FixedPointPyramid
librefocus.so:00221c00 D vtable for vision::shared::EGLOffscreenContext
librefocus.so:00221800 V vtable for vision::shared::Progress
librefocus.so:00221be0 V vtable for vision::shared::GLContext
librefocus.so:00221cd0 V vtable for vision::stereo::PlaneSweep
librefocus.so:00221ce8 D vtable for vision::stereo::GPUPlaneSweep
librefocus.so:00221d20 D vtable for vision::stereo::PhotoConsistencySAD
librefocus.so:00221d00 V vtable for vision::stereo::PhotoConsistencyBase
librefocus.so:00221d60 D vtable for vision::stereo::MultithreadPlaneSweep
librefocus.so:00221ae8 V vtable for vision::features::FeatureDetectorInterface
librefocus.so:00221b00 D vtable for vision::features::fast::FastDetector
librefocus.so:00221d78 V vtable for vision::tracking::KLTPyramid
librefocus.so:00221a90 V vtable for vision::tracking::KLTPyramidFactory
librefocus.so:00221dd0 D vtable for vision::tracking::KLTFeatureDetector
librefocus.so:00221da0 D vtable for vision::tracking::GaussianPyramidFactory
librefocus.so:00221db8 D vtable for vision::tracking::LaplacianPyramidFactory

So it appears they use a KLT system that is fed to an SfM model to solve for camera extrinsics and intrinsics. Possibly they solve for the fundamental between the first sequential images, decompose out the intrinsics and then solve for the essential matrix on the remainder of the sequence. After that point, they use a GPU accelerated plane sweep stereo algorithm that apparently has some belief propagation smoothing. That’s interesting stereo choice given how old plane sweeping is and the lack of popularity in the Middlebury tests. However you can’t doubt the speed. Very cool!