Three for one and one for three: Segmentation, Flow, and Surface Normals

Hoang-An Le, Anil Baslamisli, Thomas Mensink, Theo Gevers
Computer Vision Group, Informatics Institute, University of Amsterdam

Reading time ~4 minutes

Figure 1 Visualization of various modalities on Virtual KITTI (top) and Nature (bottom) dataset. From left to right: RGB image, semantic label annotation, color-coded optical flow, corresponding flow magnitude, surface normals.

Abstract

Optical flow, semantic segmentation, and surface normals represent different information modalities, yet together they bring better cues for scene understanding problems. In this paper, we study the influence between the three modalities: how one impacts on the others and their efficiency in combination. We employ a modular approach using a convolutional refinement network which is trained supervised but isolated from RGB images to enforce joint modality features. To assist the training process, we create a large-scale synthetic outdoor dataset that supports dense annotation of semantic segmentation, optical flow, and surface normals. The experimental results show positive influence among the three modalities, especially for objects’ boundaries, region consistency, and scene structures.

Figure 2 Overview of the relationship between 3 modalities: optical flow, segmentation, and surface normal.

Source code

github

Data

Parts of the Virtual KITTI dataset are included in this repository for your convenience. If used, please cite the corresponding paper following the instructions at NAVER LABS EUROPE

Ground truth surface normals are converted from the provided depth images using the method given by Barron and Malik, Shape, Illumination, and Reflectance from Shading. The source code is available in the project page. Please cite the according papers if you use the codes in your research.

Virtual KITTI
- Optical flow: ground truth (7 GB), predicted (52 GB),
- Semantic segmentation ground truth (151 MB), predicted (315 GB),
- Surface normals ground truth (63 GB), predicted (75 GB),
Nature
- Optical flow: ground truth (32 GB), predicted (15 GB),
- Semantic segmentation ground truth (114 MB), predicted (107 GB),
- Surface normals ground truth (37 GB), predicted (27 GB),

Paper

arxiv (up-to-date) | bmvc2018 | poster

Oral presentation

Citation

If you find the material useful please consider citing our work

@inproceedings{le18bmvc,
 author = {Le, Hoang{-}An and Baslamisli, Anil and Mensink, Thomas and Gevers, Theo},
 title = {{Three for one and one for three: Flow, Segmentation, and Surface Normals}},
 booktitle = {Proceedings of the British Machine Vision Conference (BMVC)},
 year = {2018},
}