Andrew Fitzgibbon (Microsoft Research Cambridge)
Fri 24 Apr 2015, 14:00 - 15:00
Informatics Forum (IF-4.31/4.33)

If you have a question about this talk, please contact: Bob Fisher (rbf)

Vision is nat­u­rally con­cerned with shape. If we could recover a sta­ble and com­pact rep­re­sen­ta­tion of object shape from images, we would hope it might aid with numer­ous vision tasks. Just the sil­hou­ette of an object is a strong cue to its iden­tity, and the sil­hou­ette is gen­er­ated by its 3D shape. In com­puter vision, many rep­re­sen­ta­tions have been explored: col­lec­tions of points, “sim­ple” shapes like ellip­soids or poly­he­dra, alge­braic sur­faces and other implicit sur­faces, gen­er­al­ized cylin­ders and rib­bons, and piece­wise (ratio­nal) poly­no­mial rep­re­sen­ta­tions like NURBS and sub­di­vi­sion sur­faces. Many of these can be embed­ded more or less straight­for­wardly into prob­a­bilis­tic shape spaces, and recov­ery (a.k.a. “learn­ing”) of one such space is the goal of the exper­i­men­tal part of this talk.

When recov­er­ing shape from mea­sure­ments, there is at first sight a nat­ural hier­ar­chy of sta­bil­ity: straight lines can rep­re­sent very lit­tle but may be robustly recov­ered from data, then come conic sec­tions, splines with fixed knots, and gen­eral piece­wise rep­re­sen­ta­tions. I will show, how­ever, that one can pass almost imme­di­ately to piece­wise rep­re­sen­ta­tions with­out loss of robust­ness. In par­tic­u­lar, I shall show how a pop­u­lar rep­re­sen­ta­tion in com­puter graphics—subdivision curves and surfaces—may read­ily be fit to a vari­ety of image data using the tech­nique for ellipse fit­ting intro­duced by Gan­der, Golub, and Strebel in 1994. I show how we can address the previously-difficult prob­lem of recov­er­ing 3D shape from mul­ti­ple sil­hou­ettes, and the con­sid­er­ably harder prob­lem which arises when the sil­hou­ettes are not from the same object instance, but from mem­bers of an object class, for exam­ple 30 images of dif­fer­ent dol­phins each in dif­fer­ent poses. This requires that we simul­ta­ne­ously learn the shape space and the pro­jec­tions from each instance into its image. This simul­ta­ne­ous opti­miza­tion is rem­i­nis­cent of the bun­dle adjust­ment prob­lem in com­puter vision, and indeed our most recent appli­ca­tion, to track­ing the human hand, makes good use of the Ceres Solver.