State recognition of food images is a recent topic that is gaining a huge interest in the Computer Vision community. Recently, researchers presented a dataset of food images at different states where unfortunately no information regarding the food category was included. In practical food monitoring applications it is important to be able to recognize a peeled tomato instead of a generic peeled item. To this end, in this paper, we introduce a new dataset containing 20 different food categories taken from fruits and vegetables at 11 different states ranging from solid, sliced to creamy paste. We experiment with most common Convolutional Neural Network (CNN) architectures on three different recognition tasks: food categories, food states, and both food categories and states. Since lack of labeled data is a common situation in practical applications, here we exploits deep features extracted from CNNs combined with Support Vector Machines (SVMs) as an alternative to the End-to-End classification. We also compare deep features with several hand-crafted features. These experiments confirm that deep features outperform hand-crafted features on all the three classification tasks and whatever is the food category or food state considered. Finally, we test the generalization capability of the most performing deep features by using another, publicly available, dataset of food states. This last experiment shows that the features extracted from a CNN trained on our proposed dataset achieve performance quite close to the one achieved by the state of the art method. This confirms that our deep features are robust with respect to data never seen by the CNN.
Dataset available upon request.