- The full research paper of GeoSpark has been accepted by Geoinformatica Journal. This paper has over 40 pages to dissect GeoSpark in details and compare it with many other existing systems such as Magellan, Simba, and SpatialHadoop.
- GeoSpark 1.1.3 is released. This release contains a critical bug fix for GeoSpark-core RDD API. Release notes || Maven Coordinate.
- GeoSpark 1.1.2 is released. This release contains several bug fixes. Thanks for the patch from Lucas C.! Release notes || Maven Coordinate.
Companies are using GeoSpark¶
Please make a Pull Request to add yourself!
GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL that efficiently load, process, and analyze large-scale spatial data across machines.
GeoSpark contains three modules:
|GeoSpark-SQL||SQL/DataFrame||SparkSQL 2.1 and later||Spark-core, Spark-SQL, GeoSpark-core|
|GeoSpark-Viz||RDD||Spark 2.X/1.X||Spark-core, GeoSpark-core|
- Core: GeoSpark SpatialRDDs and Query Operators.
- SQL: SQL interfaces for GeoSpark core.
- Viz: Visualization extension of GeoSpark core.
GeoSpark development team has published four papers about GeoSpark. Please read Publications.
GeoSpark received an evaluation from PVLDB 2018 paper "How Good Are Modern Spatial Analytics Systems?" Varun Pandey, Andreas Kipf, Thomas Neumann, Alfons Kemper (Technical University of Munich), quoted as follows:
GeoSpark comes close to a complete spatial analytics system. It also exhibits the best performance in most cases.
- Spatial RDD
- Spatial SQL
SELECT superhero.name FROM city, superhero WHERE ST_Contains(city.geom, superhero.geom) AND city.name = 'Gotham';
- Complex geometries / trajectories: point, polygon, linestring, multi-point, multi-polygon, multi-linestring, GeometryCollection
- Various input formats: CSV, TSV, WKT, WKB, GeoJSON, NASA NetCDF/HDF, Shapefile (.shp, .shx, .dbf): extension must be in lower case
- Spatial query: range query, range join query, distance join query, K Nearest Neighbor query
- Spatial index: R-Tree, Quad-Tree
- Spatial partitioning: KDB-Tree, Quad-Tree, R-Tree, Voronoi diagram, Hilbert curve, Uniform grids
- Coordinate Reference System / Spatial Reference System Transformation: for exmaple, from WGS84 (EPSG:4326, degree-based), to EPSG:3857 (meter-based)
- High resolution map: Scatter plot, heat map, choropleth map
GeoSpark Visualization Extension (GeoSparkViz)¶
GeoSparkViz is a large-scale in-memory geospatial visualization system.
GeoSparkViz provides native support for general cartographic design by extending GeoSpark to process large-scale spatial data. It can visulize Spatial RDD and Spatial Queries and render super high resolution image in parallel.
More details are available here: GeoSpark Visualization Extension