TL;DR: 3DEnhancer employs a multi-view diffusion model to enhance multi-view images, thus improving 3D models. Our contributions include a robust data augmentation pipeline, and the view-consistent blocks that integrate multi-view row attention and near-view epipolar aggregation modules to promote view consistency.
Despite advances in neural rendering, due to the scarcity of high-quality 3D datasets and the inherent limitations of multi-view diffusion models, view synthesis and 3D model generation are restricted to low resolutions with suboptimal multi-view consistency. In this study, we present a novel 3D enhancement pipeline, dubbed 3DEnhancer, which employs a multi-view latent diffusion model to enhance coarse 3D inputs while preserving multi-view consistency. Our method includes a pose-aware encoder and a diffusion-based denoiser to refine low-quality multi-view images, along with data augmentation and a multi-view attention module with epipolar aggregation to maintain consistent, high-quality 3D outputs across views. Unlike existing video-based approaches, our model supports seamless multi-view enhancement with improved coherence across diverse viewing angles. Extensive evaluations show that 3DEnhancer significantly outperforms existing methods, boosting both multi-view enhancement and per-instance 3D optimization tasks.
By harnessing the power of generative priors, 3DEnhancer adapts a text-to-image diffusion model to a multi-view framework aimed at 3D enhancement. It is compatible with multi-view images generated by models like MVDream or those rendered from coarse 3D representations, such as NeRF and 3DGS. Given low-quality multi-view images along with their corresponding camera poses, 3DEnhancer aggregates multi-view information within a DiT framework using row attention and epipolar aggregation modules, improving visual quality while preserving high consistency across views. Furthermore, the model supports texture-level editing via text prompts and adjustable noise levels, allowing users to correct texture errors and control the enhancement strength.
To control the enhancement strength, 3DEnhancer applies noise augmentation by adding controllable noise to the input multi-view images. Low noise levels generally result in restoration while high noise levels facilitate generation. This enables a trade-off between fidelity and quality.
Coarse Multi-view Images
Enhanced Multi-view Images (less noise -> more noise)
@article{luo20243denhancer,
title={3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement},
author={Yihang Luo and Shangchen Zhou and Yushi Lan and Xingang Pan and Chen Change Loy},
booktitle={arXiv preprint arXiv:2412.18565}
year={2024},
}