#GeoAI #Ideas #Science

AI Foundation Models for Geospatial and Earth Observation: A New Era of Earth Understanding

05.6.2025

Foundation models, which are large, pretrained AI systems, have had a significant impact on natural language processing, computer vision, and now, Earth observation. These models are trained on extensive, diverse, and frequently multimodal datasets, enabling them to generalize effectively across various tasks, regions, and sensors. In the geospatial domain, they enable capabilities to monitor, analyze, and predict changes on our planet.

But how are foundation models different from LLMs (Large Language Models)?

Large Language Models (LLMs) are a specific type of foundation model designed primarily for tasks involving natural language. While all LLMs are foundation models, not all foundation models are LLMs. Foundation models can span various domains—including vision, language, audio, or even multimodal tasks—whereas LLMs focus specifically on understanding and generating human language. The table below summarizes the key differences and relationship between them:

Foundation Models vs LLMs

Recent advancements signal a new wave of geospatial intelligence that leverages foundation models to address challenges in climate change, disaster response, urbanization, and agriculture. This article explores the impact of foundation models on geospatial analysis.

What Are Foundation Models in the Geospatial Context?

Foundation models are deep learning architectures trained on extensive datasets to learn general representations that can be fine-tuned for specific tasks. In EO, they are often multimodal, having been trained on satellite imagery, radar data, topography, weather, and even text-based reports. This allows them to perform a wide array of geospatial tasks without being retrained from scratch.

Unlike traditional EO algorithms such as NDVI, which are handcrafted indices designed to measure specific phenomena (e.g., vegetation health using spectral bands), foundation models automatically learn complex patterns and features from data. They do not rely on fixed rules or predefined formulas but instead extract hierarchical representations through large-scale learning. This makes them more flexible and powerful when addressing diverse applications like land use mapping, deforestation tracking, or flood detection. Rather than building a separate model for each task, a foundation model can be adapted to multiple domains, improving efficiency and performance at scale.

Foundation Models in EO

Key Research and Industry Examples

1. IBM’s TerraMind

IBM’s recently open-sourced TerraMind project, in collaboration with the European Space Agency (ESA), is a state-of-the-art foundation model for Earth observation. It integrates nine modalities – distinct types of data such as optical imagery, SAR (synthetic aperture radar), elevation models, weather data, and textual annotations.

Trained on over 524 million EO tiles, TerraMind is capable of performing semantic segmentation (e.g., detecting water bodies, vegetation types), classification (e.g., urban vs. rural), and even inferring missing modalities. Its performance significantly surpasses that of traditional single-task models, particularly in regions with limited data resources.

2. Google’s Geospatial Reasoning Models

In its April 2025 blog post, Google introduced models that integrate EO data with language and contextual data to provide geospatial reasoning. These models are used in:

Flood forecasting
Real-time wildfire mapping
Population displacement tracking

By integrating satellite data with topography, hydrology, and language inputs such as field reports or emergency calls, the models can reason beyond pixels to understand context and risk.

3. Meta’s Segment Anything Model (SAM) in Remote Sensing

Although not exclusively geospatial, SAM has been fine-tuned by researchers to segment satellite images – e.g., identifying buildings, agricultural fields, or glaciers. This “click-and-segment” approach democratizes EO data interpretation for non-experts.

Other Surveys and Research

Foundation Models for Generalist Geospatial Artificial Intelligence:
- Introduces a framework for pre-training and fine-tuning foundation models on extensive geospatial data, demonstrating applications in cloud gap imputation and flood mapping.
- arXiv Preprint
Vision Foundation Models in Remote Sensing: A Survey:
- Provides a comprehensive overview of vision foundation models in remote sensing, discussing architectures, datasets, and methodologies.
- arXiv Preprint
Foundation Models for Remote Sensing and Earth observation: A Survey:
- Systematically reviews remote sensing foundation models, categorizing them into visual foundation models, vision-language models, and large language models, and discusses their applications and challenges.
- arXiv Preprint

Applications Across Domains

Domain	Example Use Case	Description
Earth Observation	Clay Foundation Model by DevelopmentSeed	This foundation model is versatile and open-source, making it suitable for a variety of Earth observation applications. It is capable of handling diverse data sources, including satellite, drone, and SAR imagery. It employs a two-stage transformer architecture and a flexible data pipeline for scalable training.
Geospatial AI	Vision-Language Queryable Earth by Element84	It utilizes vision-language models like SkyCLIP and RemoteCLIP to enable text-based geospatial queries, allowing users to retrieve satellite images based on natural language descriptions.
Geospatial AI	Natural Language Geocoding by Element84	It employs natural language processing to interpret user queries (e.g., “Show me algal blooms within 2 miles of Cape Cod”) and displays relevant geospatial data, enhancing accessibility for non-experts.
Geospatial AI	Cloud Detection in Satellite Imagery by Element84	It develops machine learning models to automatically detect and remove clouds from satellite images, improving the accuracy of downstream geospatial analyses.
Agriculture	Crop health prediction	Foundation models detect drought stress or disease outbreaks using multispectral imagery.
Disaster Response	Post-earthquake damage assessment	Models segment damaged buildings and blocked roads using SAR + optical data.
Climate Monitoring	Glacier retreat tracking	Foundation models segment ice cover over time to monitor melting rates.
Urban Planning	Informal settlement mapping	Combining EO with census or text data to detect unplanned urban sprawl.
Forestry	Illegal logging detection	Models detect canopy loss in near real-time across large forested areas.

Future Directions

What’s Next? The future of this field is bright, and open-source software is an integral part of it. The development of open-source foundation models is expected to accelerate, enabling academic institutions and government agencies to leverage advanced AI capabilities without incurring substantial computational costs. Domain-specific transformer architectures will be fine-tuned for Earth observation time-series analysis and multimodal geospatial data fusion. Deployment at the edge – on drones and mobile devices – will support real-time, in-field decision-making for applications such as precision agriculture and wildfire monitoring. Additionally, integration with policy frameworks will allow these models to directly contribute to tracking progress on the United Nations Sustainable Development Goals (SDGs), including indicators such as access to clean water and land degradation.

Conclusion

Foundation models in EO represent a leap toward general-purpose geospatial intelligence. By learning from diverse data across the globe, they enable more accurate, scalable, and context-aware analysis of our dynamic Earth. As these models become more open, interpretable, and efficient, they will transform how we respond to environmental challenges and opportunities across science, policy, and society.

Did you like the article? Read more and subscribe to our monthly newsletter!

Say thanks for this article (11)

Sebastian Walczak

31 posts

Want to be an author?

The community is supported by:

Become a sponsor

#GeoAI

#Featured #Insights #Space

How 15 Centimeter Satellite Imagery is Changing the Mapping Game

Aleks Buczkowski 12.2.2024

AWESOME 9

#GeoDev #Ideas #Insights

Landscape of UX in Geospatial/EO: Revolutionizing Earth Observation and Remote Sensing

Sebastian Walczak 05.7.2025

AWESOME 1

#GeoAI #Ideas #Insights #Space

From Accuracy to Action: How HD Satellite Data and AI Are Revolutionizing Navigation for Emergency Response and Logistics

Aleks Buczkowski 06.3.2025

AWESOME 10

#Insights #Startups

Street-Level Imagery by Bee Maps is Redefining Mapmaking

Aleks Buczkowski

04.20.2025

It’s an exciting time for geospatial technology, particularly at the street level, where innovation isn’t just about maps—it’s about reshaping how we experience and navigate the world around us. At the forefront of this revolution is Bee Maps powered by Hivemapper, a company driven by crowdsourced mapping, AI-driven spatial intelligence, and an ambitious vision to redefine mapping. Street-level imagery has proven to be the cornerstone of mapping technology, adopted widely by industry giants. Yet, Bee Maps demonstrates that there’s still enormous room for innovation, making it an exciting entry point into the dynamic world of geospatial technology.

Goli Emami, a Stanford-educated AI scientist who joined Hivemapper about seven months ago, embodies this excitement. With a robust background in AI, including roles at prestigious companies like C3AI and internships at Uber’s autonomous driving team, Goli chose Hivemapper to dive deeply into the practical, impactful applications of AI and graph theory on real-world problems.

“I’ve always been passionate about applying theoretical computer science to tangible challenges,” shares Goli. “At Bee Maps, the speed at which we innovate and iterate is incredible. In just seven months, I feel like I’ve gained experience equivalent to five years at a traditional big tech firm.”

Bee Maps operates uniquely through a globally distributed network of drivers who’ve joined the network and are equipped with advanced edge computing devices—far beyond traditional dashcams. These devices capture detailed imagery every eight meters, instantly processing data through highly efficient AI algorithms, currently employing computer vision models, to detect critical street-level objects such as signage, construction zones, and billboards. This data is then validated and utilized to generate highly accurate mapping information, identifying changes to ensure that map data remains consistently updated. Drivers are compensated for capturing this valuable data and may receive additional incentives to focus on specific areas where detailed mapping data is particularly needed.

“Our ability to rapidly iterate and expand our object detection capabilities is fascinating,” explains Goli. “If there’s a client request for specific spatial intelligence—say, identifying construction zones or certain billboards—we can deploy customized AI models within days, gathering data through incentivized crowd participation and quickly turning raw images into actionable insights.”

Street-Level Imagery by Bee Maps is Redefining Mapmaking

The company’s advanced ‘AI Trainer Pipeline’ platform incentivizes users through rewards, such as tokens or financial compensation, to label data, dramatically accelerating model training processes. Goli notes that this innovative, human-in-the-loop methodology can transform thousands of raw data points into high-quality labeled data overnight.

What’s especially thrilling for Goli—and anyone joining the Bee Maps team—is the opportunity to work directly with cutting-edge vision language models (VLMs). These AI tools,, extract sophisticated metadata from imagery, making it possible to interpret complex visual data like gas prices or dynamic billboard content, even where conventional OCR fails.

Street-Level Imagery by Bee Maps is Redefining Mapmaking

Goli also emphasizes the strategic aspect of her role, dedicating much of her time to exploring emerging research, experimenting with new model architectures, and pushing the envelope of what’s possible in AI-powered geospatial solutions. “About 70% of my work involves experimentation and staying at the forefront of technology,” she says. “We’re always exploring new models, testing their capabilities, and rapidly integrating successes into our pipeline.”

Culturally, Bee Maps stands out too. With a flat organizational structure focused purely on innovation and execution, team members like Goli experience an environment free from corporate politics. “Here, it’s build, build, build—fast and efficient,” she remarks enthusiastically. “The culture was a huge draw for me. Everyone is laser-focused on progress and collaboration.”

Looking ahead, Goli sees enormous potential not just for Bee Maps but for the entire geospatial sector. From embedding AI directly into vehicle systems to creating real-time, dynamic mapping infrastructures powered by crowdsourced data, the future is rich with possibilities. Bee Maps, she believes, is perfectly positioned at this intersection of innovation, technology, and practicality.

For those considering their next career move, Bee Maps powered by Hivemapper offers more than a job—it’s an opportunity to shape the future of mapping technology. As Goli’s experience shows, joining Bee Maps means becoming part of a pioneering team that’s redefining geospatial technology, one street-level insight at a time.