Main Menu

IROS 2020 | DiPE: Deeper into Photometric Errors for Unsupervised Learning of Depth and Ego-motion from Monocular Videos

  • 2020.10.25
  • News
AIRS's 9 papers are accepted by IROS 2020, and two in award finalists.

AIRS's 9 papers are accepted by IROS 2020 and two papers are in award finalists. 

We are going to introduce these papers during the time of IROS 2020. Below is the article DiPE: Deeper into Photometric Errors for Unsupervised Learning of Depth and Ego-motion from Monocular Videos.

Research Backgroud

The depth and ego-motion estimation is the core problem in SLAM. Learning based methods are more promising than traditional methods in some long existing difficulties, like textureless scenarios, and thus become popular in recent years.

Unsupervised learning of depth and ego-motion from unlabeled monocular videos does not depend on expensive ground truth, thus having better generalization.

The unsupervised learning assumes that the scene is stable and visible from different views, leverages photometric errors of between-view reconstruction as objective function to training networks (Fig.1) and thus suffers from scene dynamics and occlusion.


Fig. 1: The Pipeline of Unsupervised Learning of Depth and Ego-motion and Effect of Outlier Masking


We handle the moving vehicles, especially the oncoming ones better in unsupervised monocular depth estimation by the proposed outlier masking technique. The recent state-of-the-art methods, Struct2depth, EPC++ and Monodepth2 underestimate the depth of the oncoming car, as illustrated in Fig.2, while our DiPE can handle this case better.Our key observation is that the occluded and moving regions always produce more significant photometric errors. Theoretically, these regions violate the assumption, thus cannot be well reconstructed.

Fig. 2:Solving Opposite Moving Objects.

Technically, we avoid these regions as statistical outliers to influence the learning, whose effect is shown in the right of Fig.1. Combined with another proposed method, weighted multi-scale scheme, we handle the artifacts in monocular depth estimation better than our baseline, as shown in Fig.3.

Fig. 3:Solving artifacts

The first author of this article is Hualie Jiang. He is currently a final-year PhD. student at The Chinese University of Hong Kong, Shenzhen advised by Prof. Rui Huang. During his PhD. period, he has performed research about depth estimation in different scenarios, including indoor supervised depth estimation, outdoor unsupervised depth estimation from monocular videos, and spherical panorama depth estimation. The corresponding author, Professor Rui Huang, is now an Associate Professor in The Chinese University of Hong Kong, Shenzhen. Professor Huang has worked on various research topics including subspace analysis, deformable models, probabilistic graphical models, and their applications in computer vision, pattern recognition and medical imaging.  His more recent research focuses on machine learning methods for intelligent video surveillance, including pedestrian detection, tracking, identification, etc.  Currently he is working on vision for robotics.  He has published more than 50 papers in related areas and has been the principal investigator of various research grants.