Friday, March 8, 2024

Lecture Series “Robust AI”: Petra Bevandić

On Friday, March 15, we will continue our lecture series “Robust AI”. Petra Bevandić will give a lecture on “Leveraging class relations for multi-dataset semantic segmentation”.

When & where
15.03.2024 14:15h in CITEC 1.204 and zoom
https://uni-bielefeld.zoom-x.de/j/61439312549?pwd=OGxLMmVFeG93VGc1VnV3U3RoR0l6dz09

Abstract
Recent attention in the field has been directed towards training semantic segmentation models on multiple datasets. This trend stems from the increasing demand for models that exhibit robust performance across diverse visual domains. However, a significant obstacle to principled training arises from incompatible labeling policies among established datasets. Take, for instance, the disparity in labeling policies between Cityscapes and Vistas, where the road class in Cityscapes encompasses all driving surfaces, while Vistas delineates separate classes for road markings, manholes, and other elements. Even more challenging is the case of overlapping labels, such as pickups being labeled as trucks in VIPER, cars in Vistas, and vans in ADE20k.
To address these issues, we propose the adoption of a flat universal taxonomy that defines standalone visual concepts within a dataset collection. By expressing dataset-specific labels as unions of universal visual concepts, we facilitate seamless and principled learning on multi-domain dataset collections without a need for any further relabeling. This approach is demonstrated to be applicable in standard classification training based on cross-entropy loss, as well as in the training of masked-based segmentation models like Mask2Former.
Furthermore, we introduce a method for constructing the universal taxonomy for a given dataset collection in a fully automated manner. This is achieved through the iterative integration of dataset pairs based on visual-semantic relations.
Our approach not only achieves competitive within-dataset and cross-dataset generalization but also demonstrates the ability to learn visual concepts not separately labeled in any of the training datasets.