Beyond PostGIS: The Rise of the Spatial Lakehouse
For many GIS teams, spatial data still begins in a database. Parcels, roads, buildings, utility networks, and field edits are stored, checked, and served from systems that many people can use at the same time. In this world, PostGIS became one of the most trusted foundations for operational GIS.
That role is still important. PostGIS is excellent when the task requires live multi-user editing, strict spatial constraints, and safe, concurrent transactions. It helps teams maintain authoritative data and serve low-latency spatial queries to applications. If a city is editing road closures or a utility company is updating pipe networks, PostGIS is still a strong choice.
But analytical GIS has a different shape. It is less about editing one feature and more about scanning millions or billions of rows to find patterns. Teams may want to compare historical building footprints, join national-scale mobility data, or prepare spatial features for machine learning. For these workloads, repeatedly loading everything into a transactional database can become slow, expensive, and unnecessary.
That pressure is showing up in how tools are being combined. Even PostGIS-focused companies are now exploring ways to connect PostGIS with DuckDB and GeoParquet for analytics. Other comparisons make a similar distinction: PostGIS is usually stronger for shared operational systems, while DuckDB Spatial is useful for lightweight analytical work.
The future is not PostGIS versus the lakehouse. It is a hybrid. PostGIS remains the system of record for live operational GIS, while GeoParquet, Apache Iceberg, and DuckDB support analytical spatial data stored directly on cloud object storage.
This article looks at why traditional analytical GIS workflows are breaking, how the spatial lakehouse stack works, what performance details matter, and where PostGIS still fits in the hybrid GIS future.
Why Analytical GIS Breaks Traditional Workflows
Traditional GIS analytics often begins with a long detour. A new dataset lands in cloud storage. Before anyone can ask a spatial question, the file is downloaded, converted, reprojected, loaded into a database, indexed, queried, and often exported again for another team. The workflow is familiar, but it creates friction at every step.
The first problem is duplication. The original files still exist in cloud storage, while the copied database tables exist somewhere else. This increases storage cost and creates confusion about which version is current. The second problem is slow ETL (Extract, Transform, Load ). Analysts may spend hours preparing data before running the first query. For fast-moving work, that delay matters. The third problem is scaling. A database often couples storage and compute more tightly than a lakehouse-style architecture. As datasets grow, teams may need larger database instances even when the data is only queried occasionally. The final problem is staleness. Once data is exported into local files or side databases, teams can easily end up analyzing old copies instead of the source.
This is where zero-copy data sharing becomes useful. Zero-copy does not mean zero data engineering. It means avoiding unnecessary copying before analysis.

Traditional vs. zero-copy data sharing. Source: Conduktor
In a zero-copy spatial workflow, geospatial data stays in cloud object storage in an optimized format. Multiple tools can then query the same files through governed access, rather than creating new copies for every workflow. DuckDB’s spatial extension and cloud-native geospatial tools show why this is becoming practical: analysts can run spatial SQL closer to the files, without always turning the database into the first stop.
The Three-Part Spatial Lakehouse Stack
A spatial lakehouse is easier to understand if we think of it like a book.
GeoParquet is the pages. It stores geospatial data in the Parquet format, which is column-oriented. Instead of reading every attribute in every row, an analytical engine can read only the columns it needs. If a query only needs geometry and population, it does not have to scan long text descriptions, timestamps, or other unused fields. GeoParquet also adds spatial metadata, so tools can understand geometry columns in a standard way.

Parquet file layout. Source: Cloud Native Geo
Apache Iceberg is the table of contents. A large spatial dataset may not be one file. It may be thousands of Parquet files spread across cloud object storage. Iceberg organizes those files into reliable analytical tables. It tracks snapshots, schema changes, partitions, and metadata, so different engines can read the same table without losing consistency. This is important when teams need versioned analytical data rather than loose folders of files.
DuckDB is the reader. It is an in-process analytical database that can query files directly using SQL. With the DuckDB Spatial extension, analysts can run spatial operations from a laptop, notebook, or lightweight workflow without maintaining a separate database server. For many exploratory tasks, this makes spatial analytics feel much closer to working with files than managing infrastructure.
In practice, GeoParquet stores the spatial pages efficiently, Iceberg keeps the table organized and versioned, and DuckDB reads the data and runs the queries.
DuckDB has one limitation: scale. DuckDB is excellent for local and embedded analytics, but it is not the only reader. For very large distributed workloads, the same Iceberg tables can also be queried by engines such as Spark, Trino, Snowflake, Databricks, Dremio, or Apache Sedona.
That is the real value of the lakehouse pattern. The data stays in open formats, while different engines can read it depending on the size of the job.
Fast Queries Require Good Layout
A spatial lakehouse is not automatically fast just because the data is stored in the cloud. Cloud storage is cheap and flexible, but it does not know on its own which files are useful for a spatial query.
Imagine a city has millions of building footprints stored across thousands of files. If an analyst asks, “Which buildings are inside this flood zone?”, the system should not open every file in the dataset. It should first identify which files are likely to contain buildings near that flood zone, and ignore the rest.
This is where file layout matters. Iceberg can help by organizing files into partitions, such as country, region, or date. Parquet can store simple statistics about each file or row group. GeoParquet can add spatial metadata, such as bounding boxes, so engines can quickly check whether a file overlaps the query area.
Spatial clustering makes this even better. If nearby features are stored close together, the engine can read less data. Methods such as Z-order and Hilbert curves help arrange spatial data so that nearby locations in the real world are also stored closer together in the file layout. Some platforms also use approaches such as liquid clustering to improve data skipping without depending only on fixed partitions.
Fast lakehouse queries depend on good metadata and good organization. DuckDB, Sedona, Spark, or any other engine can only skip data if the files give them enough information to skip safely. Apache Sedona’s GeoParquet workflow shows how spatial filters can use bounding-box metadata before reading full geometry payloads.
So the lakehouse does not remove data engineering. It changes the work. Instead of spending most of the effort importing files into a database, teams spend more effort organizing files, metadata, and layouts so queries can avoid reading unnecessary data.
The Hybrid GIS Future
The future is not a clean break from PostGIS. It is a better division of labor.
PostGIS remains the operational layer: the place for live editing, spatial constraints, web applications, concurrent users, and transactional safety. The spatial lakehouse becomes the analytical layer: the place for large scans, historical datasets, cloud-native files, and machine learning pipelines. This split is already visible in practice. CARTO is working with major platforms to support Iceberg and GeoParquet as open foundations for cloud-native spatial analytics.
A recent spatial SQL article makes this difference easier to understand using NYC open data. In a spatial join that counted building polygons inside neighborhoods, SedonaDB finished in 0.24 seconds, compared with 6.4 seconds for PostGIS and 43.1 seconds for DuckDB. In a distance query that counted buildings within 200 meters of fire hydrants, SedonaDB took 1.3 seconds, PostGIS took 21.8 seconds, and DuckDB took 41.5 seconds. In a K-nearest-neighbor task that searched for the five closest hydrants to every building, SedonaDB finished in 1.8 seconds, PostGIS took 83.3 seconds, and DuckDB ran into a memory error. The article is worth reading directly because it explains the actual query patterns and practical setup behind the numbers.
These results should not be treated as a universal ranking. They show that analytical spatial workloads behave differently from operational GIS workloads. A live editing system, a national-scale building footprint analysis, and a spatial machine learning pipeline do not need the same architecture.
The real change is not that PostGIS is being replaced. It is that analytical GIS no longer has to begin by importing every dataset into a database. Operational GIS can stay close to PostGIS, while analytical GIS moves closer to open files, cloud storage, and engines built for large scans.
Further Resources



