Total Selfie: Generating Full-Body Selfies


Total Selfie generates full-body selfies (right), similar to a photo someone else would take of you at a given scene. At the start of the day, the users pre-capture a video of themselves with their shoes, pants, and outfit (left). They can also choose a target pose(s) for the day's photos. Then they can proceed to take any number of on-site image pairs (selfie + background; middle) to produce a full-body image at each location.


We present a method to generate full-body selfies -- photos that you take of yourself, but capturing your whole body as if someone else took the photo of you from a few feet away. Our approach takes as input a pre-captured video of your body, a target pose photo, and a selfie + background pair for each location. We introduce a novel diffusion-based approach to combine all of this information into high quality, well-composed photos of you with the desired pose and background.


Total Selfie

NPP-Net architecture.

Pipeline of Total Selfie. Given selfie video frames of different body parts (blue box), Region-Aware Generation (green box) trains a multi-concept DreamBooth to generate an initial full body image \(I_g\) in the background \(I_b\) with the target pose \(I_t\). Appearance Refinement (orange box) refines face region of \(I_g\) by incorporating the expression from the on-site selfie \(I_s\) with perspective undistortion. In addition, other body parts (e.g., cloth) are also refined using a similar idea with slight modifications. The refined image is defined as \(I_r\). Image Harmonization (purple box) harmonizes the refined image to improve unnatural regions using diffusion prior with appropriate guidance, generating the final output \(I_h\).



Total Selfie has several limitations: (1) The shading in the generated full-body image may not align accurately with the actual photo. This happens when the shading in the initial full-body image (generated by DreamBooth) greatly differs from the shading in the on-site selfie. A potential avenue for future exploration could involve harnessing the on-site selfie to guide the region-aware generation.

NPP-Net architecture.