publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2023
- RecolorNeRF: Layer Decomposed Radiance Fields for Efficient Color Editing of 3D ScenesIn Proceedings of the 31st ACM International Conference on Multimedia, 2023
Radiance fields have gradually become a main representation of media. Although its appearance editing has been studied, how to achieve view-consistent recoloring in an efficient manner is still under explored. We present RecolorNeRF, a novel user-friendly color editing approach for the neural radiance fields. Our key idea is to decompose the scene into a set of pure-colored layers, forming a palette. By this means, color manipulation can be conducted by altering the color components of the palette directly. To support efficient palette-based editing, the color of each layer needs to be as representative as possible. In the end, the problem is formulated as an optimization problem, where the layers and their blending weights are jointly optimized with the NeRF itself. Extensive experiments show that our jointly-optimized layer decomposition can be used against multiple backbones and produce photo-realistic recolored novel-view renderings. We demonstrate that RecolorNeRF outperforms baseline methods both quantitatively and qualitatively for color editing even in complex real-world scenes.
- SeamlessNeRF: Stitching Part NeRFs with Gradient PropagationIn SIGGRAPH Asia 2023 Conference Papers, 2023
Neural Radiance Fields (NeRFs) have emerged as promising digital mediums of 3D objects and scenes, sparking a surge in research to extend the editing capabilities in this domain. The task of seamless editing and merging of multiple NeRFs, resembling the “Poisson blending” in 2D image editing, remains a critical operation that is under-explored by existing work. To fill this gap, we propose SeamlessNeRF, a novel approach for seamless appearance blending of multiple NeRFs. In specific, we aim to optimize the appearance of a target radiance field in order to harmonize its merge with a source field. We propose a well-tailored optimization procedure for blending, which is constrained by 1) pinning the radiance color in the intersecting boundary area between the source and target fields and 2) maintaining the original gradient of the target. Extensive experiments validate that our approach can effectively propagate the source appearance from the boundary area to the entire target field through the gradients. To the best of our knowledge, SeamlessNeRF is the first work that introduces gradient-guided appearance editing to radiance fields, offering solutions for seamless stitching of 3D objects represented in NeRFs. Our code and more results are available at https://sites.google.com/view/seamlessnerf.
2022
- Structure-Aware Meta-Fusion for Image Super-ResolutionHaoyu Ma, Bingchen Gong, and Yizhou YuACM Trans. Multimedia Comput. Commun. Appl., Feb 2022
There are two main categories of image super-resolution algorithms: distortion oriented and perception oriented. Recent evidence shows that reconstruction accuracy and perceptual quality are typically in disagreement with each other. In this article, we present a new image super-resolution framework that is capable of striking a balance between distortion and perception. The core of our framework is a deep fusion network capable of generating a final high-resolution image by fusing a pair of deterministic and stochastic images using spatially varying weights. To make a single fusion model produce images with varying degrees of stochasticity, we further incorporate meta-learning into our fusion network. Once equipped with the kernel produced by a kernel prediction module, our meta fusion network is able to produce final images at any desired level of stochasticity. Experimental results indicate that our meta fusion network outperforms existing state-of-the-art SISR algorithms on widely used datasets, including PIRM-val, DIV2K-val, Set5, Set14, Urban100, Manga109, and B100. In addition, it is capable of producing high-resolution images that achieve low distortion and high perceptual quality simultaneously.
2021
- ME-PCN: Point Completion Conditioned on Mask EmptinessIn Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2021
Point completion refers to completing the missing geometries of an object from incomplete observations. Main-stream methods predict the missing shapes by decoding a global feature learned from the input point cloud, which often leads to deficient results in preserving topology consistency and surface details. In this work, we present ME-PCN, a point completion network that leverages ‘emptiness’ in 3D shape space. Given a single depth scan, previous methods often encode the occupied partial shapes while ignoring the empty regions (e.g. holes) in depth maps. In contrast, we argue that these ‘emptiness’ clues indicate shape boundaries that can be used to improve topology representation and detail granularity on surfaces. Specifically, our ME-PCN encodes both the occupied point cloud and the neighboring ‘empty points’. It estimates coarse-grained but complete and reasonable surface points in the first stage, followed by a refinement stage to produce fine-grained surface details. Comprehensive experiments verify that our ME-PCN presents better qualitative and quantitative performance against the state-of-the-art. Besides, we further prove that our ‘emptiness’ design is lightweight and easy to embed in existing methods, which shows consistent effectiveness in improving the CD and EMD scores.
2018
- Image Super-Resolution via Deterministic-Stochastic Synthesis and Local Statistical RectificationWeifeng Ge, Bingchen Gong, and Yizhou YuACM Trans. Graph., Dec 2018
Single image superresolution has been a popular research topic in the last two decades and has recently received a new wave of interest due to deep neural networks. In this paper, we approach this problem from a different perspective. With respect to a downsampled low resolution image, we model a high resolution image as a combination of two components, a deterministic component and a stochastic component. The deterministic component can be recovered from the low-frequency signals in the downsampled image. The stochastic component, on the other hand, contains the signals that have little correlation with the low resolution image. We adopt two complementary methods for generating these two components. While generative adversarial networks are used for the stochastic component, deterministic component reconstruction is formulated as a regression problem solved using deep neural networks. Since the deterministic component exhibits clearer local orientations, we design novel loss functions tailored for such properties for training the deep regression network. These two methods are first applied to the entire input image to produce two distinct high-resolution images. Afterwards, these two images are fused together using another deep neural network that also performs local statistical rectification, which tries to make the local statistics of the fused image match the same local statistics of the groundtruth image. Quantitative results and a user study indicate that the proposed method outperforms existing state-of-the-art algorithms with a clear margin.
2016
- Tamp: A Library for Compact Deep Neural Networks with Structured MatricesIn Proceedings of the 24th ACM International Conference on Multimedia, Dec 2016
We introduce Tamp, an open source C++ library for reducing the space and time costs of deep neural network models. In particular, Tamp implements several recent works which use structured matrices to replace unstructured matrices which are often bottlenecks in neural networks.Tamp is also designed to serve as a unified development platform with several supported optimization back-ends and abstracted data types.This paper introduces the design and API and also demonstrates the effectiveness with experiments on public datasets.
2015
- Audeosynth: Music-Driven Video MontageZicheng Liao, Yizhou Yu, Bingchen Gong, and Lechao ChengACM Trans. Graph., Jul 2015
We introduce music-driven video montage, a media format that offers a pleasant way to browse or summarize video clips collected from various occasions, including gatherings and adventures. In music-driven video montage, the music drives the composition of the video content. According to musical movement and beats, video clips are organized to form a montage that visually reflects the experiential properties of the music. Nonetheless, it takes enormous manual work and artistic expertise to create it. In this paper, we develop a framework for automatically generating music-driven video montages. The input is a set of video clips and a piece of background music. By analyzing the music and video content, our system extracts carefully designed temporal features from the input, and casts the synthesis problem as an optimization and solves the parameters through Markov Chain Monte Carlo sampling. The output is a video montage whose visual activities are cut and synchronized with the rhythm of the music, rendering a symphony of audio-visual resonance.