Semantic segmentation architectures are mainly built upon an encoder-decoder structure. These models perform subsequent downsampling operations in the encoder. Since
operations on high-resolution activation maps are computationally expensive, usually the
decoder produces output segmentation maps by upsampling with parameters-free operators like bilinear or nearest-neighbor. We propose a Neural Network named Guided Upsampling Network which consists of a multiresolution architecture that jointly exploits
high-resolution and large context information. Then we introduce a new module named
Guided Upsampling Module (GUM) that enriches upsampling operators by introducing a
learnable transformation for semantic maps. It can be plugged into any existing encoderdecoder architecture with little modifications and low additional computation cost. We
show with quantitative and qualitative experiments how our network benefits from the
use of GUM module. A comprehensive set of experiments on the publicly available
Cityscapes dataset demonstrates that Guided Upsampling Network can efficiently process high-resolution images in real-time while attaining state-of-the art performances.

Real Time Semantic Segmentation

Demo Video

Publications