We present a method to generate full-body selfies -- photos that you take of yourself, but capturing your whole body as if someone else took the photo of you from a few feet away. Our approach takes as input a pre-captured video of your body, a target pose photo, and a selfie + background pair for each location. We introduce a novel diffusion-based approach to combine all of this information into high quality, well-composed photos of you with the desired pose and background.
Pipeline of Total Selfie. Given selfie video frames of different body parts (blue box), Region-Aware Generation (green box) trains a multi-concept DreamBooth to generate an initial full body image \(I_g\) in the background \(I_b\) with the target pose \(I_t\). Appearance Refinement (orange box) refines face region of \(I_g\) by incorporating the expression from the on-site selfie \(I_s\) with perspective undistortion. In addition, other body parts (e.g., cloth) are also refined using a similar idea with slight modifications. The refined image is defined as \(I_r\). Image Harmonization (purple box) harmonizes the refined image to improve unnatural regions using diffusion prior with appropriate guidance, generating the final output \(I_h\).
Total Selfie has several limitations: (1) The shading in the generated full-body image may not align accurately with the actual photo. This happens when the shading in the initial full-body image (generated by DreamBooth) greatly differs from the shading in the on-site selfie. A potential avenue for future exploration could involve harnessing the on-site selfie to guide the region-aware generation.