Benchmarking Algorithms for Food Localization and Semantic Segmentation

The problem of food segmentation is quite challenging since food is characterized by intrinsic high intra-class variability. Also, segmentation of food images taken in-the-wild may be characterized by acquisition artifacts, and that could be problematic for the segmentation algorithms. A proper evaluating of segmentation algorithms is of paramount importance for the design and improvement of food analysis systems that can work in less-than-ideal real scenarios. In this paper, we evaluate the performance of different deep learning-based segmentation algorithms in the context of food. Due to the lack of large-scale food segmentation datasets, we initially use a dataset composed of 5,000 images of 50 diverse food categories [1]. The images are then accurately annotated with pixel-wise annotations. Concerning testing the algorithms under different conditions, the dataset is augmented with the same images but rendered under different acquisition distortions that comprise illuminant change, JPEG compression, Gaussian noise, and Gaussian blur. The final dataset is composed of 120,000 images. Using standard benchmark measures, we conducted extensive experiments to evaluate ten state-of-the-art segmentation algorithms on two tasks: food localization and semantic food segmentation.

[1] Chen, M. Y., Yang, Y. H., Ho, C. J., Wang, S. H., Liu, S. M., Chang, E., Ouhyoung, M. Automatic chinese food identification and quantity estimation. In SIGGRAPH Asia 2012 Technical Briefs (p. 29). ACM.


Here we provide the ZIP file with the data needed to reproduce the dataset used in the paper. It can be downloaded from this link: (20.6 MiB, 304 hits)

The original dataset must be requested to the owners [1].

Please send a mail at this address and the password will be sent to you.

The file contains:

  • The binary masks for the food localization
  • The masks for the semantic food segmentation
  • The file with the labels
  • The file with the list of the images in the test set
  • The codes to generate the augmented images
  • The code for the evaluation of the segmentation results

If you download the dataset, it is assumed that you agree to the following copyright notice:

Copyright (c) 2020 Imaging and Vision Laboratory, University of Milano-Bicocca All rights reserved.

Permission is hereby granted, without written agreement and without license or royalty fees, to use, copy, modify, and distribute this dataset (the images, the results, and the source files) and its documentation for research purpose, provided that the copyright notice in its entirety appear in all copies of this dataset, and the original source of this dataset, The Imaging and Vision Laboratory (IVL, at the University of Milano-Bicocca(UNIMIB, ), and the authors are acknowledged in any publication that reports research using this dataset.


The original images belong to:

Chen, M.Y., Yang, Y.H., Ho, C.J., Wang, S.H., Liu, S.M., Chang, E., Yeh, C.H., Ouhyoung, M.: Automatic Chinese food identification and quantity estimation. In: SIGGRAPH Asia 2012 Technical Briefs, p. 29. ACM (2012)