Charles Ruizhongtai Qi
Staff Research Scientist
Waymo LLC
Mountain View, CA
Email: rqi [at] stanford [dot] edu

[Publications]  [Education]  [Experiences]  [Talks]  [Misc] 
[Google Scholar]  [GitHub]  [LinkedIn]

I am a research scientist at Waymo, leading a team that develops ML algorithms for autonomous driving, with a focus on 3D perception and data-driven simulation. Before that I was a postdoctoral researcher at Facebook AI Research (FAIR). I received my Ph.D. from Stanford University (Stanford AI Lab and Geometric Computation Group), advised by Professor Leonidas J. Guibas. Prior to joining Stanford, I got my B.Eng. from Tsinghua University.

My research focuses on deep learning, computer vision and 3D. I have developed novel deep learning architectures for 3D data (point clouds, volumetric grids and multi-view images) that have wide applications in 3D object classification, object part segmentation, semantic scene parsing, scene flow estimation and 3D reconstruction. Those deep architectures have been well adopted by both academic and industrial groups across the world. I have also invented several state-of-the-art methods for 3D object recognition, which reinforce current and future applications in augmented reality and robotics. My more recent interests are on scalable and data-efficient perception and data-driven simulation. If you are interested in my research or have any use cases to share, feel free to contact me!

NEW We have 3 papers accepted to CVPR 2023. See you in Vancouver!


MoDAR: Using Motion Forecasting for 3D Object Detection in Point Cloud Sequences, CVPR 2023
Yingwei Li*, Charles R. Qi*, Yin Zhou, Chenxi Liu, Dragomir Anguelov
(*: equal contribution)

We propose a novel method to efficiently leverage long-term temporal sequences for 3D object detection. Our method, MoDAR, uses motion forecasting outputs as a type of virtual modality, to augment LiDAR point clouds. It is widely applicable to any point cloud based detectors.

paper / bibtex

GINA-3D: Learning to Generate Implicit Neural Assets in the Wild , CVPR 2023
Bokui Shen, Xinchen Yan, Charles R. Qi, Mahyar Najibi, Boyang Deng, Leonidas Guibas, Yin Zhou, Dragomir Anguelov

We propose a generative model (using tri-plane NeRF, VQGAN and auto-regressive models) that leverages real-world driving data from camera and LiDAR sensors to create realistic 3D implicit neural assets of diverse vehicles and pedestrians. Compared to prior work, GINA-3D tackles the real world challenges of occlusions, lighting-variations and long-tail distributions.

paper / bibtex / dataset (coming soon)

NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as General Image Priors, CVPR 2023
Congyue Deng, Chiyu Max Jiang, Charles R. Qi, Xinchen Yan, Yin Zhou, Leonidas Guibas, Dragomir Anguelov

NeRDi is a single-view NeRF synthesis framework using general image priors from 2D diffusion models. Off-the-shelf vision-language models are used for a two-section language guidance as conditioning inputs to the diffusion model, which helps improve multiview content coherence.

paper / bibtex

LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds, Oral Presentation, ECCV 2022
Minghua Liu, Yin Zhou, Charles R. Qi, Boqing Gong, Hao Su, Dragomir Anguelov

Dense segmentation labels are very costly to acquire especially for 3D point clouds. This work proposes a new labeling and model training pipeline to learn 3D semantic segmentation of Lidar points with less human labeling.

paper / bibtex

Improving the Intra-class Long-tail in 3D Detection via Rare Example Mining, ECCV 2022
Chiyu (Max) Jiang, Mahyar Najibi, Charles R. Qi, Yin Zhou, Dragomir Anguelov

Continued improvement of machine learning models is critical for practical use cases. While previous works focus on hard example mining, in this study, we identify a new conceptual dimension - rareness - to mine new data for improving the long-tail performance of models.

paper / bibtex

Motion Inspired Unsupervised Perception and Prediction in Autonomous Driving, ECCV 2022
Mahyar Najibi, Jingwei Ji, Yin Zhou, Charles R. Qi, Xinchen Yan, Scott Ettinger, Dragomir Anguelov

This work is one of the first steps towards building autonomous systems that does not require human supervision. We propose to use motion flows as cues to auto label moving objects in driving scenes and then learn from those auto labels to detect and predict motions of the objects.

paper / bibtex

LidarNAS: Unifying and Searching Neural Architectures for 3D Point Clouds, ECCV 2022
Chenxi Liu, Zhaoqi Leng, Pei Sun, Shuyang Cheng, Charles R. Qi, Yin Zhou, Mingxing Tan, Dragomir Anguelov

This paper proposes a unified framework for NAS of deep nets for 3D point clouds, providing novel perspectives to understand the relationships among many popular network architectures.

paper / bibtex

RIDDLE: Lidar Data Compression with Range Image Deep Delta Encoding, CVPR 2022
Xuanyu Zhou*, Charles R. Qi*, Yin Zhou, Dragomir Anguelov (*: equal contribution)

As LiDAR sensors become more powerful (higher resolutions), the data storage and transmission costs grow quickly. This paper proposes a state-of-the-art method for LiDAR range image compression.

paper / bibtex

Multi-Class 3D Object Detection with Single-Class Supervision, ICRA 2022
Mao Ye, Chenxi Liu, Maoqing Yao, Weiyue Wang, Zhaoqi Leng, Charles R. Qi, Dragomir Anguelov

Train multi-class 3D object detectors with single-class labels on disjoint data samples (e.g. some frames have labels for vehicles and some frames have labels for pedestrians).

paper / bibtex

Revisiting 3D Object Detection From an Egocentric Perspective, Neurips 2021
Boyang Deng, Charles R. Qi, Mahyar Najibi, Thomas Funkhouser, Yin Zhou, Dragomir Anguelov

We revisit the way we evaluate 3D object detection under the context of autonomous driving; and propose a new metric bsaed on support distances and a new shape representation: amodal contours.

paper / bibtex

SPG: Unsupervised Domain Adaptation for 3D Object Detection via Semantic Point Generation, ICCV 2021
Qiangeng Xu, Yin Zhou, Weiyue Wang, Charles R. Qi, Dragomir Anguelov

We propose a network to complete object's geometry by semantic point generation, achieving significant improvments on the out-of-domain data (with rainy weather) and state of the art on KITTI.

paper / bibtex

Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset, ICCV 2021
S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y. Chai, B. Sapp, Charles R. Qi, Y. Zhou, Z. Yang, A. Chouard, P. Sun, J. Ngiam, V. Vasudevan, A. McCauley, J. Shlens, D. Anguelov

A motion forecasting dataset with 100,000 scenes, each 20 seconds long at 10 Hz, more than 570 hours of unique data over 1750 km of roadways. 3D maps and high qualtity auto labels are provided. We hope that this new large-scale interactive motion dataset will provide new opportunities for advancing research in motion forecasting and autonomous driving.

paper / dataset / bibtex

Offboard 3D Object Detection from Point Cloud Sequences, CVPR 2021
Charles R. Qi, Yin Zhou, Mahyar Najibi, Pei Sun, Khoa Vo, Boyang Deng, Dragomir Anguelov

While current 3D object recognition research mostly focuses on the real-time, onboard scenario, there are many offboard use cases of perception that are largely under-explored, such as using machines to automatically generate high-quality 3D labels. In this paper, we propose a novel offboard 3D object detection pipeline using point cloud sequence data.

This work was used to auto label the Waymo Motion Dataset. Feel free to check it out!

paper / blog post / bibtex / talk

PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding, Spotlight, ECCV 2020
Saining Xie, Jiatao Gu, Demi Guo, Charles R. Qi, Leonidas J. Guibas, Or Litany

Local contrastive learning for 3D representation learning. The unsupervisely learned representation can generalize across tasks and helps improve severl high-level semantic understanding problems rangining from semgentation to detection on six different datasets.

paper / code / bibtex

ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes, CVPR 2020
Charles R. Qi*, Xinlei Chen*, Or Litany, Leonidas J. Guibas (*: equal contribution)

Extensions of VoteNet to leverage RGB images. By lifting 2D image votes to 3D, RGB images can provide strong geometric cues for 3D object localization and pose estimation, while their textures and colors provide semantic cues. A special multi-tower training scheme also makes the 2D-3D feature fusion more effective.

paper / bibtex / code

Deep Hough Voting for 3D Object Detection in Point Clouds, Oral Presentation, ICCV 2019
Charles R. Qi, Or Litany, Kaiming He, Leonidas J. Guibas

Best Paper Award Nomination (one of the seven among 1,075 accepted papers) [link]

We show a revive of generalize Hough voting in the era of deep learning for the task of 3D object detection in point clouds. Our voting-based detection network (VoteNet) is both fast and top performing.

paper / bibtex / code / talk

KPConv: Flexible and Deformable Convolution for Point Clouds, ICCV 2019
Hugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Francois Goulette, Leonidas J. Guibas

Proposed a point centric way for deep learning on 3D point clouds with kernel point convolution (KPConv) where we define a convolution kernel as a set of spatially localized and deformable points.

paper / bibtex / code

Generating 3D Adversarial Point Clouds, CVPR 2019
Chong Xiang, Charles R. Qi, Bo Li

Proposed several novel algorithms to craft adversarial point clouds against 3D deep learning models with adversarial points perturbation and adversarial points generation.

paper / bibtex / code

FlowNet3D: Learning Scene Flow in 3D Point Clouds, CVPR 2019
Xingyu Liu*, Charles R. Qi*, Leonidas Guibas (*: equal contribution)

Proposed a novel deep neural network that learns scene flow from point clouds in an end-to-end fashion.

paper / bibtex / code

Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks, ICML 2018
Zhihao Jia, Sina Lin, Charles R. Qi, Alex Aiken

We studied how to parallelize training of deep convolutional networks beyond simple data or model parallelism. Proposed a layer-wise parallelism that allows each layer in a network to use an individual parallelization strategy.

paper / bibtex

Frustum PointNets for 3D Object Detection from RGB-D Data, CVPR 2018
Charles R. Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J. Guibas

Proposed a novel framework for 3D object detection with image region proposals (lifted to 3D frustums) and PointNets. Our method is simple, efficient and effective, ranking at first place for KITTI 3D object detection benchmark on all categories (11/27/2017).

paper / bibtex / code / website

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space, NIPS 2017
Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas

Proposed a hierarchical neural network on point sets that captures local context. Compared with PointNet, PointNet++ achieves better performance and generalizability in complex scenes and is able to deal with non-uniform sampling density.

paper / bibtex / code / website / poster

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, Oral Presentation, CVPR 2017
Charles R. Qi*, Hao Su*, Kaichun Mo, and Leonidas J. Guibas (*: equal contribution)

Proposed novel neural networks to directly consume an unordered point cloud as input, without converting to other 3D representations such as voxel grids first. Rich theoretical and empirical analyses are provided.

paper / bibtex / code / website / presentation video

Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis, Spotlight Presentation, CVPR 2017
Angela Dai, Charles R. Qi, Matthias Niessner

A data-driven approach to complete partial 3D shapes through a combination of volumetric deep neural networks and 3D shape synthesis.

paper / bibtex / website (code & data available)

Volumetric and Multi-View CNNs for Object Classification on 3D Data, Spotlight Presentation, CVPR 2016
Charles R. Qi*, Hao Su*, Matthias Niessner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas (*: equal contribution)

Novel architectures for 3D CNNs that take volumetric or multi-view representations as input.

paper / bibtex / code / website / supp / presentation video

FPNN: Field Probing Neural Networks for 3D Data, NIPS 2016
Yangyan Li, Soeren Pirk, Hao Su, Charles R. Qi, and Leonidas J. Guibas

A very efficient 3D deep learning method for volumetric data processing that takes advantage of data sparsity in 3D fields.

paper / bibtex / code / website

Joint Embeddings of Shapes and Images via CNN Image Purification, SIGGRAPH Asia 2015
Yangyan Li*, Hao Su*, Charles R. Qi, Noa Fish, Daniel Cohen-Or, and Leonidas J. Guibas (*: equal contribution)

Cross-modality learning of 3D shapes and 2D images by neural networks. A joint embedding space that is sensitive to 3D geometry difference but agnostic to other nuisances is constructed.

paper / bibtex / code / website / live demo

Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views, Oral Presentation, ICCV 2015
Hao Su*, Charles R. Qi*, Yangyan Li, Leonidas J. Guibas (*equal contribution)

Pioneering work that shows large-scale synthetic data rendered from virtual world may greatly benefit deep learning to work in real world. Deliver a state-of-the-art viewpoint estimator.

paper / bibtex / code / website / presentation video