#Featured #GeoDev

Key Insights from New Earth Observation Research on Data Processing Challenges

The field of Earth Observation (EO) stands at a critical inflection point. Following recent discussions at the Geospatial World Forum, our comprehensive new report titled “Addressing Earth Observation Data Annotation Challenges” dives deep into the fundamental hurdles preventing us from unlocking the full value of our orbital perspective.

Our planet is now observed by an ever-expanding constellation of satellites generating an unprecedented volume of data. NASA’s EOSDIS repository alone has accumulated over 100 petabytes and is projected to expand nearly sixfold to reach 600 petabytes within five years. With satellites generating over 2 petabytes of data daily from sources like Sentinel and Landsat, processing challenges have become increasingly complex, as highlighted in Geospatial World’s analysis. This wealth of information offers transformative potential across numerous applications – from climate monitoring to urban planning.

However, the path from raw satellite imagery to actionable insight is fraught with significant challenges, particularly in creating high-quality annotated datasets essential for training sophisticated machine learning models. As noted in our report, traditional approaches to data processing have become impractical as the sheer scale of information continues to grow exponentially.

Understanding the “Four Vs” of Big EO Data

Our report frames the challenges through the lens of the “Four Vs” of Big EO Data, providing a comprehensive framework for understanding the multifaceted nature of these obstacles:

Volume: The Scale Challenge

High-resolution satellite scenes commonly exceed 30MB per image, and projects covering substantial geographic regions quickly accumulate into multi-gigabyte or even terabyte-scale datasets. These large files create significant performance bottlenecks in traditional annotation systems, resulting in:

  • Sluggish loading times
  • Navigation difficulties
  • Frequent interface lags
  • Inability to effectively examine details by zooming or panning

When annotators need to examine details by zooming into specific regions, conventional tools often struggle to maintain responsiveness. The resulting delays directly impact both the efficiency of the annotation process and the quality of the resulting training data.

Velocity: The Data Stream Challenge

Beyond just the volume, the rate at which new Earth Observation data arrives creates processing challenges. As satellite technology advances, sensors capture imagery with increasingly frequent revisit times, generating continuous streams requiring near-real-time processing.

The conventional approach of downloading complete datasets to local systems before beginning analysis has become impractical due to transmission bottlenecks, introducing delays that undermine the value of time-sensitive applications. Recent research points to “a need for datasets specifically tailored for spatio-temporal ML methodologies, aiming to train ML methods to forecast and explain the impacts of extreme events,” according to Nature Scientific Data.

Variety: The Heterogeneity Challenge

Earth Observation data encompasses an extremely diverse ecosystem of information sources across multiple sensor types – optical, radar, thermal, and hyperspectral – with widely varying:

  • Spatial resolutions (from sub-meter to kilometers)
  • Temporal frequencies
  • Spectral bands
  • File formats (GeoTIFF, netCDF, HDF5, and proprietary formats)

This incredible variety creates substantial obstacles when attempting to integrate or compare information across different observation systems. The heterogeneity of EO data presents significant preprocessing challenges for machine learning engineers. Satellites provide imagery ranging from 500m (MODIS) to sub-meter (WorldView) resolutions, each with different file formats that require specialized processing pipelines, as detailed in this Geospatial World analysis.

According to researchers in Earth Science Informatics, “Without a well-defined, organized structure they face problems in finding and reusing existing data, and as consequence this may cause data inconsistency and storage problems.” This technical complexity includes dealing with specialized reference systems and scaling up processing to large amounts of data.

Veracity: The Quality Challenge

The trustworthiness and reliability of Earth Observation data presents another layer of challenge. Raw satellite imagery is susceptible to numerous quality issues:

  • Sensor imperfections
  • Atmospheric conditions
  • Geometric distortions during acquisition
  • Artifacts introduced during processing

When creating training datasets for machine learning applications, additional veracity challenges emerge, including class imbalance problems and difficulties in obtaining sufficient high-quality annotated examples. These factors can significantly impact the reliability of analytical results.

The International Organization for Standardization (ISO) defines five quantitative spatial data quality elements: completeness, logical consistency, positional accuracy, temporal accuracy, and thematic accuracy as discussed in PLOS Neglected Tropical Diseases. However, the same research found that “extensive use is made of EO products in the study of NTD epidemiology; however, the quality of these products is usually given little explicit attention.”

Recent research published in Nature Communications highlights that “ignoring the spatial distribution of the data led to the deceptively high predictive power of the model due to spatial autocorrelation (SAC), while appropriate spatial model validation methods revealed poor relationships.” This means that unaddressed spatial dependencies between training and test datasets significantly affect model generalization capabilities.

Beyond the Four Vs: Additional Critical Challenges

The Precision Challenge

Annotating satellite imagery requires exceptional precision to accurately capture the diverse and often complex features present in Earth Observation data. Unlike everyday photographs, satellite imagery contains specialized geographical information and often requires domain expertise to interpret correctly.

Features like urban structures, agricultural fields, forest boundaries, or disaster impacts must be delineated with high accuracy to create training data that enables effective machine learning models. Without access to geographic coordinate systems, distance measurements, or tools designed specifically for geospatial contexts, annotators may struggle to accurately identify and annotate features of interest.

Domain expertise is crucial, with some studies finding that misclassification occurs “because the labelers had misclassified what to identify as rubble in the images, which may have happened because the labeling process was undertaken by non-experts who lacked the necessary understanding,” as noted in our research.

The Multi-Spectral Challenge

One of the most powerful but challenging aspects of Earth Observation data is its multi-spectral nature. Unlike standard RGB photography, satellite sensors can capture information across numerous wavelength bands, including those outside the visible spectrum.

This multi-spectral data enables the analysis of invisible characteristics and conditions. For instance, in agriculture, different bands can indicate plant health, moisture content, or pest stress before such conditions are visible to the human eye, enabling early intervention. However, working with multi-spectral imagery presents significant challenges for annotation workflows, as annotators must be able to view and switch between different spectral bands to identify features that may be apparent in one wavelength but not in others.

Recent approaches recognize the importance of creating “harmonised datasets tailored for spatio-temporal ML methodologies,” according to Nature Scientific Data, and developing “cost-efficient resource allocation solutions… which are robust against common uncertainties that occur in annotating and extracting features on satellite images,” as highlighted by the IET Digital Library.

The integration of multiple data sources presents challenges in bias correction, with research from Springer noting that “if the biases are not corrected before data fusion we introduce further problems, such as spurious trends, leading to the possibility of unsuitable policy decisions.”

The Collaboration Challenge

Earth Observation projects often involve diverse stakeholders including government agencies, research institutions, non-governmental organizations, and private sector entities. In our report, we identify several barriers to seamless collaboration:

  • Technical incompatibilities (diverse data formats, inconsistent metadata)
  • Restrictive data policies limited by national laws or institutional regulations
  • Intellectual property concerns regarding ownership of annotations
  • Lack of governance frameworks establishing clear rules for data access
  • Trust and awareness issues between potential collaborators
  • Capacity and resource gaps, particularly for organizations in developing nations

These collaboration barriers directly impact annotation quality and efficiency, as teams struggle to coordinate efforts, share knowledge, and maintain consistent standards across different annotators and geographic regions.

Spatial-Temporal Complexity: A Fundamental Challenge

A critical aspect highlighted in recent research is the unique nature of spatiotemporal data mining. Earth observation data points are not independent and identically distributed, violating fundamental assumptions of classical machine learning approaches, according to research from NIH. Environmental processes exhibit dynamic variability across space and time domains, creating what researchers call “spatial heterogeneity and temporal non-stationarity.”

This intrinsic spatial-temporal complexity requires specialized methodologies. The workflow often requires extensive preprocessing stages, with some researchers proposing structured pipelines for “enhancing model accuracy, addressing issues like imbalanced data, spatial autocorrelation, prediction errors, and the nuances of model generalization and uncertainty estimation,” as highlighted in Nature Communications.

Key Requirements for Effective EO Data Processing

Our report highlights several fundamental requirements for addressing Earth Observation data challenges:

Technical Requirements

The increasing complexity of EO data demands specialized approaches to processing and annotation. According to our report, effective systems must address scalability concerns through optimized handling of massive imagery files, preserve geographic context during annotation, support multi-spectral data visualization, and enable quality control mechanisms that ensure data reliability.

Traditional image annotation software may not support direct use of geospatial metadata embedded within images, making it challenging for users to work with GeoTIFF files effectively, as noted in our analysis. A significant challenge is making Earth observation data “analysis-ready” – meaning properly preprocessed, structured, and formatted to enable direct analysis without extensive technical preparation.

Process Requirements

Beyond technical capabilities, our report emphasizes the importance of collaborative workflows that facilitate knowledge sharing between domain experts and annotators, standardized methods for quality assessment, and security frameworks that protect sensitive geospatial information.

These requirements align with established ISO standards for spatial data quality, which define quantitative elements including completeness, logical consistency, positional accuracy, temporal accuracy, and thematic accuracy.

Emerging Solutions and Hybrid Approaches

Researchers increasingly recognize the need for hybrid approaches that combine physical models with data-driven machine learning. Nature’s research suggests that “the next step will be a hybrid modelling approach, coupling physical process models with the versatility of data-driven machine learning.”

This combines the theoretical understanding from physics-based models with the pattern-recognition capabilities of machine learning, particularly for addressing complex Earth system challenges through deep learning methods that can automatically extract spatial-temporal features.

Recent developments to address these challenges include:

  1. Data cube architectures that provide standardized access to analysis-ready data
  2. Specialized geospatial annotation platforms with coordinate reference system support
  3. Tools that enable visualization and annotation across multiple spectral bands
  4. Foundation models like SAM2 to accelerate annotation processes
  5. Quality control systems with consensus mechanisms and programmatic validation
  6. Spatiotemporal ML methodologies designed specifically for Earth observation data
  7. Reproducible data processing pipelines that combine multi-modal data

Applications and Future Directions

Real-World Applications Highlighted in Our Report

Our report presents several case studies demonstrating the real-world impact of advanced geospatial data processing:

  • Disaster Response: The xBD dataset project for building damage assessment from post-disaster satellite imagery, showcasing the importance of standardized annotation frameworks across different disaster types
  • Environmental Mapping: The Global Ecosystems Atlas Project, creating comprehensive maps of marine, terrestrial, and freshwater ecosystems in the Maldives
  • Agriculture: Crop monitoring applications utilizing multi-spectral data to detect plant health issues and pest infestations before they become visible to the human eye

These applications illustrate how overcoming EO annotation challenges translates to tangible benefits across critical domains.

Future Trends

Looking forward, our report identifies several emerging directions in geospatial data processing, including AI integration through initiatives like ESA’s Φ-lab, efforts to bridge the digital divide in EO capabilities, multi-modal data integration, edge computing approaches, and advanced privacy protections as imagery resolution continues to increase.

The Path Forward

The insights presented in our report underscore a fundamental truth: the future of Earth Observation depends on our ability to effectively manage and utilize the vast amounts of data we collect. By implementing the capabilities outlined in this research, organizations can overcome the critical challenges in geospatial data processing and annotation, creating higher-quality training datasets that enable more robust and effective machine learning models.

These advancements are not merely technical achievements but essential steps toward harnessing the power of Earth Observation to address some of humanity’s most pressing global challenges – from climate change monitoring and disaster response to sustainable agriculture and urban planning.

For a deeper dive into these challenges and solutions, our full “Addressing Earth Observation Data Annotation Challenges” report serves as a valuable guide for understanding the landscape of geospatial data processing and the essential capabilities required to transform the potential of Earth Observation into tangible impact.


About Kili Technology

Kili Technology began as an idea in 2018. Edouard d’Archimbaud, co-founder and CTO, was working at BNP Paribas, where he built one of the most advanced AI Labs in Europe from scratch. François-Xavier Leduc, our co-founder and CEO, knew how to take a powerful insight and build a company around it. While all the AI hype was on the models, they focused on helping people understand what was truly important: the data. Together, they founded Kili Technology to ensure data was no longer a barrier to good AI. By July 2020, the Kili Technology platform was live and by the end of the year, the first customers had renewed their contract, and the pipeline was full. In 2021, Kili Technology raised over $30M from Serena, Headline and Balderton.

Today Kili Technology continues its journey to enable businesses around the world to build trustworthy AI with high-quality data.

Say thanks for this article (0)
The community is supported by:
Become a sponsor
#Featured
#Featured #Science
A Global Road Surface Type Classification Dataset Powered By Deep Learning and Mapillary Data
Avatar for Sukanya Randhawa
Sukanya Randhawa 11.15.2024
AWESOME 6
#Featured
Reintroducing Ourselves as Geoawesome: Entering an Era of Collaborative Storytelling
Avatar for Muthukumar Kumar
Muthukumar Kumar 09.3.2024
AWESOME 8
#Business #Events #Featured
The Future of Navigation: How Satellite Data Shapes Map Innovation
Avatar for Muthukumar Kumar
Muthukumar Kumar 10.29.2024
AWESOME 1
Next article
#Ideas

Will AI Eat Geospatial?

The LinkedIn post by Will Cadell and the question raised by Peter Rabley during his keynote at GeoIgnite 2025 were just too catchy to ignore, even though I wasn’t in the room to hear the keynote live. The question isn’t just catchy because it taps into general concerns about AI, it also challenges a long-held belief in the industry: “spatial is special.” It surfaces a deeper tension within the geospatial community of practitioners (as Will called them in one of his recent blog posts)

are we building the future, or is the future being built around us?

So will AI eat geospatial? …Not Exactly, But It Will Redesign the Menu

AI isn’t exactly going to replace geospatial and but it is certainly going to make it way more easier to abstract some of the unneccesary complexity that limits the adoption of the technology. AI is already changing workflows, expectations and architectures in ways we cannot ignore.

As a grad student in the late 2000s, one of the most boring things that I had to do was to manually digitalise several datasets including bathymetry maps to then carry out a GIS analysis to identify where polymetallic nodules might be found. My dad jokingly called it “geospatial painting” because all I seemed to be doing was tracing lines on a screen (and yes, it was exactly what I was doing). It’s hard to imagine that kind of manual work being necessary today. AI can now extract and digitize contour lines from scanned maps in minutes.

AI is fundamentally shifting how geospatial tasks are approached. Imagery analysis, which once relied heavily on manual classification, is now being handled by deep learning and foundation models capable of interpreting complex patterns. Traditional data fusion methods, typically based on stacking layers within a GIS, are giving way to AI-driven techniques that can integrate diverse sources such as SAR, optical imagery, vector data, and IoT feeds in a more dynamic and contextual manner. Searching spatial data no longer depends solely on coordinate-based queries or bounding boxes; natural language interfaces are emerging, allowing users to ask location-based questions conversationally. Cartography is also evolving, static maps are being replaced by generative, context-aware representations that can adapt based on user needs. Finally, change detection, previously guided by fixed thresholds, is increasingly managed by AI models trained to identify anomalies over time. These shifts point to a broader transformation: from geospatial tools to geospatial reasoning.

This shift demands more than new algorithms. It requires rethinking the structure of spatial data itself. One notable recent effort is the OGC’s AI+DGGS Pilot, which reimagines geospatial architecture to align with the strengths of AI. Using Discrete Global Grid Systems (DGGS), the project lays the groundwork for scalable, consistent, and AI-ready data systems. It is not just about integrating AI into GIS; it is about making geospatial data interoperable, structured, and ready for reasoning at scale.

So yes, AI will consume

  • Repetitive workflows

  • Manual image labeling and feature extraction

  • Rigid pipelines that cannot scale or adapt

  • Low-value services that offer little beyond automation

We still need to figure out the core essence of the work: driving insights from the data by asking better questions and helping decision-makers make more informed decisions.

What Powers AI? Magic? Nope it’s Data!

As Andrew Ng, co-founder of Google Brain and Coursera, put it:

“The dominant paradigm over the last decade was to download the dataset while you focus on improving the code. But for many practical applications, it’s now more productive to hold the neural network architecture fixed, and instead find ways to improve the data.”
Andrew Ng, Data-Centric AI, NeurIPS 2021

Great Geo+AI does not start with a model. It starts with thoughtful, well-structured data and the geospatial expertise to get it right. Which makes good quality labeled data to train your AI application even more important. No wonder that we have a rise in the number of companies specializing and providing high-quality labeled geospatial data.

A recent guide by Kili Technology offers a practical reminder. Earth Observation data is noisy, heterogeneous, and challenging to annotate. Whether identifying rooftops, detecting roads, or classifying land use, the performance of AI models is directly tied to the quality of labeled data they are trained on.

The guide makes it clear that AI does not eliminate the need for human input. It increases the demand for domain expertise, structured workflows, and quality control. Looks like the human in the loop isn’t exactly yet a thing of the past. In this context, AI is not eating geospatial. It is relying on (geospatial) professionals to feed it high-quality, annotated, and context-rich data.

Reality Check: Trust, Ethics, and Governance

As Anusuya rightly points out in her post on the topic, AI has been part of the geospatial for a while now but now time to innovate, govern and collaborate better. She has wonderfully summarized the key points in her reflections:

  • Performance of AI trained on synthetic data is degrading

  • Customer trust in AI-generated content is declining

  • AI-related incidents rose by 56 percent last year

These are not minor issues. They signal deeper challenges in the scalability and governance of AI systems. Peter’s call to action was clear. Moving forward, the geospatial sector must:

  • Embed ethics and human oversight from the start

  • Prioritize data integrity, provenance, and trust

  • Embrace openness and minimize environmental impact

These principles are enablers of innovation.

Its time to evolve

So, will AI eat geospatial?

No. But it will reshape the value chain. And it will expose which parts of our field are ready to evolve and which are overdue for reinvention. AI is not coming for maps. It is coming for process inefficiencies, siloed data, and workflows that cannot scale. That is an opportunity.

The real work now is not to defend geospatial from AI. It is to redefine what geospatial can become with AI. Better tools, better decisions, and better systems for the world we are mapping and managing.

Let us not just respond to the future. Let us design it.

Read on
Search