Sustainability, Vol. 15, Pages 2442: Improving NoSQL Spatial-Query Processing with Server-Side In-Memory R*-Tree Indexes for Spatial Vector Data
Sustainability doi: 10.3390/su15032442
Authors: Lele Sun Baoxuan Jin
Geospatial databases are basic tools to collect, index, and manage georeferenced data indicators in sustainability research for efficient, long-term analysis. NoSQL databases are increasingly applied to manage the ever-growing massive spatial vector data (SVD) with their changeable data schemas, agile scalability, and fast query response time. Spatial queries are basic operations in geospatial databases. According to Green information technology, an efficient spatial index can accelerate query processing and save power consumption for ubiquitous spatial applications. Current solutions tend to pursue it by indexing spatial objects with space-filling curves or geohash on NoSQL databases. As for the performance-wise R-tree family, they are mainly used in slow disk-based spatial access methods on NoSQL databases that incur high loading and searching costs. Therefore, performing spatial queries efficiently with the R-tree family on NoSQL databases remains a challenge. In this paper, an in-memory balanced and distributed R*-tree index named the BDRST index is proposed and implemented on HBase for efficient spatial-query processing of massive SVD. The BDRST index stores and distributes serialized R*-trees to HBase regions in association with SVD partitions in the same table. Moreover, an efficient optimized server-side parallel processing framework is presented for real-time R*-tree instantiation and query processing. Through extensive experiments on real-world land-use data sets, the performance of our method is tested, including index building, index quality, spatial queries, and applications. Our proposed method outperforms other state-of-the-art solutions, saving between 27.36% and 95.94% on average execution time for the above operations. Experimental results show the capability of the BDRST index to support spatial queries over large-scale SVD, and our method provides a solution for efficient sustainability research that involves massive georeferenced data.