These online papers and abstracts are listed in chronological order (most
recent first). Papers may be downloaded in Adobe Acrobat (pdf), postscript (ps),
or gzip compressed postscript (.gz). Pdf is generally the quickest to download.

**Estimating Cloth Simulation Parameters from Video**,** **
K. Bhat, C. D. Twigg, J. K. Hodgins, P. K. Khosla, Z. Popovic, and S. M. Seitz,
*Proc. Symposium on Computer Animation*, 2003, to appear.

(2.4M
pdf).

Cloth simulations are notoriously difficult to tune due to the many parameters that must be adjusted to achieve the look of a particular fabric. In this paper, we present an algorithm for estimating the parameters of a cloth simulation from video data of real fabric. A perceptually motivated metric based on matching between folds is used to compare video of real cloth with simulation. This metric compares two video sequences of cloth and returns a number that measures the differences in their folds. Simulated annealing is used to minimize the frame by frame error between the metric for a given simulation and the real-world footage. To estimate all the cloth parameters, we identify simple static and dynamic calibration experiments that use small swatches of the fabric. To demonstrate the power of this approach, we use our algorithm to find the parameters for four different fabrics. We show the match between the video footage and simulated motion on the calibration experiments, on new video sequences for the swatches, and on a simulation of a full skirt.

**Shape and Materials by Example: A Photometric
Stereo Approach**,** **A. Hertzmann and S. M. Seitz, *Proc.
Computer Vision and Pattern Recognition Conf. (CVPR)*, 2003, to
appear.

(1.3M
pdf).

This paper presents a technique for computing the geometry of objects with general reflectance properties from images. For surfaces with varying material properties, a full segmentation into different material types is also computed. It is assumed that the camera

viewpoint is fixed, but the illumination varies over the input sequence. It is also assumed that one or more example objects with similar materials and known geometry are imaged under the same illumination conditions. Unlike most previous work in shape reconstruction, this technique can handle objects with arbitrary and spatially-varying BRDFs. Furthermore, the approach works for arbitrary distant and unknown lighting environments. Finally, almost no calibration is needed, making the approach exceptionally simple to apply.

**Spacetime Stereo: Shape Recovery for Dynamic Scenes**,** **L. Zhang, B.
Curless, and S. M. Seitz, *Proc. Computer Vision and Pattern
Recognition Conf. (CVPR)*, 2003, to
appear.

(1.6M
pdf).

This paper extends the traditional binocular stereo problem into the spacetime domain, in which a pair of video streams is matched

simultaneously instead of matching pairs of images frame by frame. Almost any existing stereo algorithm may be extended in this manner simply by replacing the image matching term with a spacetime term. By utilizing both spatial and temporal appearance variation, this

modification reduces ambiguity and increases accuracy. Three major applications for spacetime stereo are proposed in this paper. First,

spacetime stereo serves as a general framework for structured light scanning and generates high quality depth maps for static scenes.

Second, spacetime stereo is effective for a class of natural scenes, such as waving trees and flowing water, which have repetitive textures

and chaotic behaviors and are challenging for existing stereo algorithms. Third, the approach is one of very few existing methods

that can robustly reconstruct objects that are moving and deforming over time, achieved by use of oriented spacetime windows in the matching procedure. Promising experimental results in the above three scenarios are demonstrated.

**Computing the Physical Parameters of Rigid-body
Motion from Video**,** **K. S. Bhat, S. M. Seitz, J. Popovic, and P.
Khosla, *Proc. European Conference on Computer Vision (ECCV)*, 2002, to
appear.

(6.3M
pdf).

This paper presents an optimization framework for estimating the motion and underlying physical parameters of a rigid body in free flight from video. The algorithm takes a video clip of a tumbling rigid body of known shape and generates a physical simulation of the object observed in the video clip. This solution is found by optimizing the simulation parameters to best match the motion observed in the video sequence. These simulation parameters include initial positions and velocities, environment parameters like gravity direction and parameters of the camera. A global objective function computes the sum squared difference between the silhouette of the object in simulation and the silhouette obtained from video at each frame. Applications include creating interesting rigid body animations, tracking complex rigid body motions in video and estimating camera parameters from video.

**Rapid shape acquisition using color structured light and
multi-pass dynamic programming**

L. Zhang, B.
Curless, and S. M. Seitz, Proc. Symposium on 3D Data
Processing Visualization and Transmission (3DPVT), 2002, to appear.

(4.4M
pdf).

This paper presents a color structured light technique for recovering object shape from one or more images. The technique works by projecting a pattern of stripes of alternating colors and matching the projected color transitions with observed edges in the image. The correspondence problem is solved using a novel, multi-pass dynamic

programming algorithm that eliminates global smoothness assumptions and strict ordering constraints present in previous formulations. The resulting approach is suitable for generating both high-speed scans of moving objects when projecting a single stripe pattern and high-resolution scans of static scenes using a short sequence of time-shifted stripe patterns. In the latter case, spacetime analysis is used at each sensor pixel to obtain inter-frame depth localization. Results are demonstrated for a variety of complex scenes.

**Interactive Manipulation of Rigid Body
Simulations**

Jovan Popovic, S. M. Seitz, M.
Erdmann Z Popovic, and A. Witkin, Proc. SIGGRAPH ,
2000, pp. 209-218..

(823K
pdf).

Physical simulation of dynamic objects has become commonplace in computer graphics because it produces highly realistic animations. In this paradigm the animator provides few physical parameters such as the objects’ initial positions and velocities, and the simulator automatically generates realistic motions. The resulting motion, however, is difficult to control because even a small adjustment of the input parameters cant drastically affect the subsequent motion. Furthermore, the animator often wishes to change the end-result of the motion instead of the initial physical parameters. We describe a novel interactive technique for intuitive manipulation of rigid multi-body simulations. Using our system, the animator can select bodies at any time and simply drag them to desired locations. In response, the system computes the required physical parameters and simulates the resulting motion. Surface characteristics such as normals and elasticity coefficients can also be automatically adjusted to provide a greater range of feasible motions, if the animator so desires. Because the entire simulation editing process runs at interactive speeds, the animator can rapidly design complex physical animations that would be difficult to achieve with existing rigid body simulators.

**Structure from Motion Without
Correspondences**

F. Dellaert S. M. Seitz C. E.
Thorpe S. Thrun, Proc. Computer Vision and Pattern
Recognition Conf. (CVPR) , 2000.

(500K
pdf).

A method is presented to recover 3D scene structure and camera motion from multiple images without the need for correspondence information. The problem is framed as finding the maximum likelihood structure and motion given only the 2D measurements, integrating over all possible assignments of 3D features to 2D measurements. This goal is achieved by means of an algorithm which iteratively refines a probability distribution over the set of all correspondence assignments. At each iteration a new structure from motion problem is solved, using as input a set of virtual measurements derived from this probability distribution. The distribution needed can be efficiently obtained by Markov Chain Monte Carlo sampling. The approach is cast within the framework of Expectation-Maximization, which guaran-tees convergence to a local maximizer of the likelihood. The algorithm works well in practice, as will be demonstrated using results on several real image sequences

**Shape and Motion Carving in 6D**

S. Vedula,
S. Baker, S. Seitz, and T. Kanade, Proc. Computer Vision and Pattern
Recognition Conf. (CVPR) , 2000.

(560K
pdf).

The motion of a non-rigid scene over time imposes more constraints on its structure than those derived from images at a single time instant alone. An algorithm is presented for simultaneously recovering dense scene shape and scene flow (i.e., the instantaneous 3D motion at every point in the scene). The algorithm operates by carving away

hexels, or points in the 6D space of all possible shapes and flows that are inconsistent with the images captured at either time in-stant, or across time. The recovered shape is demonstrated to be more accurate than that recovered using images at a single time instant. Applications of the combined scene shape and flow include motion capture for animation, re-timing of videos, and non-rigid motion analysis.

**A Theory of Shape by Space Carving**

K. N.
Kutulakos and S. M. Seitz, International Journal of Computer Vision, Marr
Prize Special Issue, 2000, **38(3)**, pp. 199-218. Earlier
version appeared in Proc. Seventh International Conference on Computer
Vision (ICCV) , 1999, pp. 307-314.

(1M
pdf).

In this paper we consider the problem of computing the 3D shape of an unknown, arbitrarily-shaped scene from multiple photographs taken at known but arbitrarily-distributed viewpoints. By studying the equivalence class of all 3D shapes that reproduce the input photographs, we prove the existence of a special member of this class, the photo hull, that (1) can be computed directly from photographs of the scene, and (2) subsumes all other members of this class. We then give a provably-correct algorithm, called Space Carving, for computing this shape and present experimental results on complex real-world scenes. The approach is designed to (1) capture photorealistic shapes that accurately model scene appearance from a wide range of viewpoints, and (2) account for the complex interactions between occlusion, parallax, shading, and their view-dependent effects on scene-appearance.

**Omnivergent Stereo **

H.Y. Shum, A. Kalai,
and S. M. Seitz, Proc. Seventh International Conference on Computer Vision
(ICCV) , 1999, To appear.

(1.2M
pdf).

The notion of a virtual sensor for optimal 3D reconstruction is introduced. Instead of planar perspective images that collect many rays at a fixed viewpoint,

omnivergent camerascollect a small number of rays at many different viewpoints. The resulting 2D manifold of rays are arranged into two multiple-perspective images for stereo reconstruction. We call such imagesomnivergent imagesand the process of reconstructing the scene from such imagesomnivergent stereoThis procedure is shown to produce 3D scene models with minimal reconstruction error, due to the fact that for any point in the 3D scene, two rays with maximum vergence angle can be found in the omnivergent images. Furthermore, omnivergent images are shown to have horizontal epipolar lines, enabling the application of traditional stereo matching algorithms, without modification. Three types of omnivergent virtual sensors are presented: spherical omnivergent cameras, center-strip cameras and dual-strip cameras.

**Implicit Representation and Scene Reconstruction from
Probability Density Functions **

S. M. Seitz and P. Anandan,
Proc. Computer Vision and Pattern Recognition Conf., 1999, pp.
28-34. Earlier version appeared in Proc. DARPA Image Understanding
Workshop, Monterey, CA, 1998.

(120K pdf).

A technique is presented for representing linear features as probability density functions in two or three dimensions. Three chief advantages of this approach are (1) a unified representation and algebra for manipulating points, lines, and planes, (2) seamless incorporation of uncertainty information, and (3) a very simple recursive solution for maximum likelihood shape estimation. Applications to uncalibrated affine scene reconstruction are presented, with results on images of an outdoor environment.

**What Do N Photographs Tell Us about 3D Shape?
**

K. N. Kutulakos and S. M. Seitz, TR680, Computer Science Dept., U.
Rochester, January 1998.

(2.7M pdf).

In this paper we consider the problem of computing the 3D shape of an unknown, arbitrarily-shaped scene from multiple color photographs taken at known but arbitrarily-distributed viewpoints. By studying the equivalence class of all 3D shapes that reproduce the input photographs, we prove the existence of a special member of this class, the maximal photo-consistent shape, that (1) can be computed from an arbitrary volume that contains the scene, and (2) subsumes all other members of this class. We then give a provably-correct algorithm for computing this shape and present experimental results from applying it to the reconstruction of a real 3D scene from several photographs. The approach is specifically designed to (1) build 3D shapes that allow faithful reproduction of all input photographs, (2) resolve the complex interactions between occlusion, parallax, shading, and their effects on arbitrary collections of photographs of a scene, and (3) follow a "least commitment" approach to 3D shape recovery.

**Plenoptic Image
Editing**

S. M. Seitz and K. N. Kutulakos, Proc. 6th Int. Conf.
Computer Vision, 1998, pp. 17-24.

(550K pdf,
postscript,
or 3.7M
gzip'ed postscript). Earlier version available as Technical Report 647,
Computer Science Department, University of Rochester, Rochester, NY, January
1997.

(
postscript or 3.7M
gzip'ed postscript)

This paper presents a new class of interactive image editing operations designed to maintain physical consistency between multiple images of a physical 3D object. The distinguishing feature of these operations is that edits to any one image propagate automatically to all other images as if the (unknown) 3D object had itself been modified. The approach is useful first as a power-assist that enables a user to quickly modify many images by editing just a few, and second as a means for constructing and editing image-based scene representations by manipulating a set of photographs. The approach works by extending operations like image painting, scissoring, and morphing so that they alter an object's plenoptic function in a physically-consistent way, thereby affecting object appearance from all viewpoints simultaneously. A key element in realizing these operations is a new volumetric decomposition technique for reconstructing an object's plenoptic function from an incomplete set of camera viewpoints.

S. M. Seitz, Proc. Computer Vision for Virtual Reality Workshop, 1998, pp. 17-24.

(440K pdf).

Interactive walkthrough applications require rendering an observed scene from a continuous range of target viewpoints. Toward this end, a novel approach is introduced that processes a set of input images to produce photorealistic scene reprojections over a wide range of viewpoints. This is achieved by (1) acquiring calibrated input images that are distributed throughout a target range of viewpoints to be modeled, and (2) computing a 3D reconstruction that is consistent in projection with all of the input images. The method avoids image correspondence problems by working in a discretized scene space whose voxels are traversed in a fixed visibility ordering. This strategy takes full account of occlusions and enables reconstructions of panoramic scenes. Promising initial results are presented for a room walkthrough.

S. M. Seitz, Ph.D. Dissertation, Computer Sciences Department Technical Report 1354, University of Wisconsin - Madison, October 1997. (1.7M pdf, postscript or 6.1M gzip'ed postscript)

This thesis addresses the problem of synthesizing images of real scenes under three-dimensional transformations in viewpoint and appearance. Solving this problem enables interactive viewing of remote scenes on a computer, in which a user can move a virtual camera through the environment and virtually paint or sculpt objects in the scene. It is demonstrated that a variety of three-dimensional scene transformations can be rendered on a video display device by applying simple transformations to a set of basis images of the scene. The virtue of these transformations is that they operate directly on images and recover only the scene information that is required in order to accomplish the desired effect. Consequently, they are applicable in situations where accurate three-dimensional models are difficult or impossible to obtain.

A central topic is the problem of

view synthesis, i.e., rendering images of a real scene from different camera viewpoints by processing a set of basis images. Towards this end, two algorithms are described that warp and resample pixels in a set of basis images to produce new images that are physically-valid, i.e., they correspond to what a real camera would see from the specified viewpoints. Techniques for synthesizing other types of transformations, e.g., non-rigid shape and color transformations, are also discussed. The techniques are found to perform well on a wide variety of real and synthetic images.A basic question is uniqueness, i.e., for which views is the appearance of the scene uniquely determined from the information present in the basis views. An important contribution is a uniqueness result for the no-occlusion case, which proves that all views on the line segment between the two camera centers are uniquely determined from two uncalibrated views of a scene. Importantly, neither dense pixel correspondence nor camera information is needed. From this result, a

view morphingalgorithm is derived that produces high quality viewpoint and shape transformations from two uncalibrated images.To treat the general case of many views, a novel

voxel coloringframework is introduced that facilitates the analysis of ambiguities in correspondence and scene reconstruction. Using this framework, a new type of scene invariant, calledcolor invariant, is derived, which provides intrinsic scene information useful for correspondence and view synthesis. Based on this result, an efficient voxel-based algorithm is introduced to compute reconstructions and dense correspondence from a set of basis views. This algorithm has several advantages, most notably its ability to easily handle occlusion and views that are arbitrarily far apart, and its usefulness forpanoramicvisualization of scenes. These factors also make the voxel coloring approach attractive as a means for obtaining high-quality three-dimensional reconstructions from photographs.

S. M. Seitz and C. R. Dyer, International Journal of Computer Vision, 35(2), 1999, pp. 151-173. (1.3M pdf)

Shorter version in Proc. Computer Vision and Pattern Recognition Conf., 1997, 1067-1073. (235K pdf, postscript or 1.5M gzip'ed postscript)

A novel scene reconstruction technique is presented, different from previous approaches in its ability to cope with large changes in visibility and its modeling of intrinsic scene color and texture information. The method avoids image correspondence problems by working in a discretized scene space whose voxels are traversed in a fixed visibility ordering. This strategy takes full account of occlusions and allows the input cameras to be far apart and widely distributed about the environment. The algorithm identifies a special set of invariant voxels which together form a spatial and photometric reconstruction of the scene, fully consistent with the input images. The approach is evaluated with images from both inward-facing and outward-facing cameras.

S. M. Seitz and C. R. Dyer, Int. J. Computer Vision,

This paper presents a general framework for image-based analysis of 3D repeating motions that addresses two limitations in the state of the art. First, the assumption that a motion be perfectly even from one cycle to the next is relaxed. Real repeating motions tend not to be perfectly even, i.e., the length of a cycle varies through time because of physically important changes in the scene. A generalization of {\em period} is defined for repeating motions that makes this temporal variation explicit. This representation, called the period trace, is compact and purely temporal, describing the evolution of an object or scene without reference to spatial quantities such as position or velocity. Second, the requirement that the observer be stationary is removed. Observer motion complicates image analysis because an object that undergoes a 3D repeating motion will generally not produce a repeating sequence of images. Using principles of affine invariance, we derive necessary and sufficient conditions for an image sequence to be the projection of a 3D repeating motion, accounting for changes in viewpoint and other camera parameters. Unlike previous work in visual invariance, however, our approach is applicable to objects and scenes whose motion is highly non-rigid. Experiments on real image sequences demonstrate how the approach may be used to detect several types of purely temporal motion features, relating to motion trends and irregularities. Applications to athletic and medical motion analysis are discussed.

S. M. Seitz and C. R. Dyer, in Motion-Based Recognition, M. Shah and R. Jain, eds., Kluwer, Boston, 1997, 61-85.

**Bringing Photographs to Life with View
Morphing**

S. M. Seitz, Proc. Imagina 97, 1997, 153-158.
(
postscript or 5.1M
gzip'ed postscript)

Photographs and paintings are limited in the amount of information they can convey due to their inherent lack of motion and depth. Using image morphing methods, it is now possible to add 2D motion to photographs by moving and blending image pixels in creative ways. We have taken this concept a step further by adding the ability to convey three-dimensional motions, such as scene rotations and viewpoint changes, by manipulating one or more photographs of a scene. The effect transforms a photograph or painting into an interactive visualization of the underlying object or scene in which the world may be rotated in 3D. Several potential applications of this technology are discussed, in areas such as virtual reality, image databases, and special effects.

S. M. Seitz and C. R. Dyer, Proc. Image Understanding Workshop, 1997, 881-887. ( postscript or 515K gzip'ed postscript)

This paper analyzes the conditions when a discrete set of images implicitly describes scene appearance for a continuous range of viewpoints. It is shown that two basis views of a static scene uniquely determine the set of all views on the line between their optical centers when a visibility constraint is satisfied. Additional basis views extend the range of predictable views to 2D or 3D regions of viewpoints. A simple scanline algorithm called

view morphingis presented for generating these views from a set of basis images. The technique is applicable to both calibrated and uncalibrated images.

S. M. Seitz and C. R. Dyer, Proc. 13th Int. Conf. Pattern Recognition, Vol. I, Track A: Computer Vision, 1996, 84-89. (100K pdf, 1.2M postscript or 486K gzip'ed postscript) (Longer version appears as Computer Sciences Department Technical Report 1298 (postscript or 552K gzip'ed postscript).)

The question of which views may be inferred from a set of basis images is addressed. Under certain conditions, a discrete set of images implicitly describes scene appearance for a continuous range of viewpoints. In particular, it is demonstrated that two basis views of a static scene determine the set of all views on the line between their optical centers. Additional basis views further extend the range of predictable views to a two- or three-dimensional region of viewspace. These results are shown to apply under perspective projection subject to a generic visibility constraint called monotonicity. In addition, a simple scanline algorithm is presented for actually generating these views from a set of basis images. The technique, called

view morphingmay be applied to both calibrated and uncalibrated images. At a minimum, two basis views and their fundamental matrix are needed. Experimental results are presented on real images. This work provides a theoretical foundation for image-based representations of 3D scenes by demonstrating that perspective view synthesis is a theoretically well-posed problem.

S. M. Seitz and C. R. Dyer, Proc. SIGGRAPH 96, 1996, 21-30. (500K pdf, 4.2M postscript or 1.6M gzip'ed postscript)

Image morphing techniques can generate compelling 2D transitions between images. However, differences in object pose or viewpoint often cause unnatural distortions in image morphs that are difficult to correct manually. Using basic principles of projective geometry, this paper introduces a simple extension to image morphing that correctly handles 3D projective camera and scene transformations. The technique, called

view morphing, works by prewarping two images prior to computing a morph and then postwarping the interpolated images. Because no knowledge of 3D shape is required, the technique may be applied to photographs and drawings, as well as rendered scenes. The ability to synthesize changes both in viewpoint and image structure affords a wide variety of interesting 3D effects via simple image transformations.

S. M. Seitz and C. R. Dyer, Proc. Workshop on Representation of Visual Scenes, 1995, 18-25. (300K pdf, postscript or 500K gzip'ed postscript)

Image warping is a popular tool for smoothly transforming one image to another. ``Morphing'' techniques based on geometric image interpolation create compelling visual effects, but the validity of such transformations has not been established. In particular, does 2D interpolation of two views of the same scene produce a sequence of physically valid in-between views of that scene? In this paper, we describe a simple image rectification procedure which guarantees that interpolation does in fact produce valid views, under generic assumptions about visibility and the projection process. Towards this end, it is first shown that two basis views are sufficient to predict the appearance of the scene within a specific range of new viewpoints. Second, it is demonstrated that interpolation of the rectified basis images produces exactly this range of views. Finally, it is shown that generating this range of views is a theoretically well-posed problem, requiring neither knowledge of camera positions nor 3D scene reconstruction. A scanline algorithm for view interpolation is presented that requires only four user-provided feature correspondences to produce valid orthographic views. The quality of the resulting images is demonstrated with interpolations of real imagery.

S. M. Seitz and C. R. Dyer, Proc. 5th Int. Conf. Computer Vision, 1995, 330-337. (300K pdf, postscript or 250K gzip'ed postscript)

A new technique is presented for computing 3D scene structure from point and line features in monocular image sequences. Unlike previous methods, the technique guarantees the completeness of the recovered scene, ensuring that every scene feature that is detected in each image is reconstructed. The approach relies on the presence of four or more reference features whose correspondences are known in all the images. Under an orthographic or affine camera model, the parallax of the reference features provides constraints that simplify the recovery of the rest of the visible scene. An efficient recursive algorithm is described that uses a unified framework for point and line features. The algorithm integrates the tasks of feature correspondence and structure recovery, ensuring that all reconstructible features are tracked. In addition, the algorithm is immune to outliers and feature-drift, two weaknesses of existing structure-from-motion techniques. Experimental results are presented for real images.

S. M. Seitz and C. R. Dyer, Proc. Workshop on Motion of Non-Rigid and Articulated Objects, 1994, 178-185. (postscript or 910K gzip'ed postscript)

Real cyclic motions tend not to be perfectly even, i.e., the period varies slightly from one cycle to the next, because of physically important changes in the scene. A generalization of period is defined for cyclic motions that makes periodic variation explicit. This representation, called the period trace, is compact and purely temporal, describing the evolution of an object or scene without reference to spatial quantities such as position or velocity. By delimiting cycles and identifying correspondences across cycles, the period trace provides a means of temporally registering a cyclic motion. In addition, several purely temporal motion features are derived, relating to the nature and location of irregularities. Results are presented using real image sequences and applications to athletic and medical motion analysis are discussed.

S. M. Seitz and C. R. Dyer, Proc. Computer Vision and Pattern Recognition Conf., 1994, 970-975. (postscript or 1M gzip'ed postscript)

(Different version appears as Computer Sciences Department Technical Report 1225 (postscript or 890K gzip'ed postscript).)

Current approaches for detecting periodic motion assume a stationary camera and place limits on an object's motion. These approaches rely on the assumption that a periodic motion projects to a set of periodic image curves, an assumption that is invalid in general. Using affine-invariance, we derive necessary and sufficient conditions for an image sequence to be the projection of a periodic motion. No restrictions are placed on either the motion of the camera or the object. Our algorithm is shown to be provably-correct for noise-free data and is extended to be robust with respect to occlusions and noise. The extended algorithm is evaluated with real and synthetic image sequences.

Return to Steve Seitz's home page

Last Changed: May 20, 2003