In this paper we present an extensive evaluation of visual descriptors for the content-based retrieval of remote sensing (RS) images. The evaluation includes global, local, and Convolutional Neural Network (CNNs) features coupled with four different Content-Based Image Retrieval schemas. We conducted all the experiments on two publicly available datasets: the 21- class UC Merced Land Use/Land Cover (LandUse) dataset and 19-class High-resolution Satellite Scene dataset (SceneSat). The content of RS images might be quite heterogeneous, ranging from images containing fine grained textures, to coarse grained ones or to images containing objects. It is therefore not obvious in this domain, which descriptor should be employed to describe images having such a variability. Results demonstrate that CNN- based and local features are the best performing whatever is the retrieval scheme adopted. In particular, CNN-based descriptors are more effective than local descriptors in the case of the LandUse dataset and less effective than local-based descriptors in the case of the SceneSat dataset.
Pre-computed visual descriptors:
All the visual descriptors for each datasets can be found at: