The Earth Index by Earthrise Media

Vision: Earth Index will be an intuitive, ultrafast search engine for environmental issue mapping, enabling point and click search on satellite imagery

Project Overview

Data Challenge

Speed of analysis and breadth of application

Data needs

Data practices

Public Goods, AI Training data, GIS Mapping

target users

Environmental advocates, policy makers, journalists

tech team

In-house expertise
Self-taught skills
Trusted technical partner


An unsupervised ML algorithm can be applied to develop a fast and efficient means to extract meaningful image embeddings (think of these like a DNA sequence for an image) that can help identify environmental issues from satellite imagery without human intervention or time required to label training data

Project Objective

In the PJMF Data Practice’s Data to Drive Climate Action Accelerator, the Earthrise Media team set out to build a tool to index imagery akin to ‘Google search’ that will empower decision-makers to get near real-time access to satellite imagery monitoring for a plethora of environmental issues.


Increasing Availability of Satellite imagery

Satellite imagery data has become more ubiquitous in recent years as the cost to launch and collect high-quality data looking at the earth from low earth orbit has dropped. Researchers and private sector actors can now call upon regularly updated, detailed images of the visible and Infrared (IR) Spectrum of the earth’s surface over time with incredible resolution, relying on sophisticated cameras and sensors deployed on constellations, or groups of satellites, orbiting at between 500km and 1500km above sea level.

sentinel-2: High revisit, extensive historical data

One such satellite constellation owned by the European space agency, the SENTINEL-2, can generate images with between 10-60 meters of resolution (not considered high by industry standards) over the same location in as little as every ten days; this satellite imagery is particularly well-suited for precise monitoring of changes in forests, agriculture, and climate impacts. For scientists, advocates and others, the extended view from above over time can provide answers in hours or days that historically required months or years of costly and sometimes dangerous ground-based surveys.

Petabytes of imagery data have been collected by the SENTINEL-2 constellation since its mission began in 2015. For groups seeking to monitor localized changes over vast swaths of land, deciding what to look for, accessing, and interpreting this data becomes a critical challenge

ML for environmental Monitoring

The team at Earthrise Media are global leaders in the application of satellite imagery for environmental monitoring, developing machine learning technologies that help them analyze these stacks of images rapidly. One of their analytical products, ‘Amazon Mining Watch’, showed precisely where illegal activities were taking place in the Brazilian rainforest, and aided in a government crackdown operation to protect this critically endangered ecosystem. Another Earthrise product, ‘Global Plastics Watch’, was built to identify where and how plastic waste sites are changing and leeching harmful chemicals into key waterways that populations rely on for drinking water.

No More data labeling – Enabling imagery search with unsupervised ML

Typically, constructing and deploying such ML models to assist in classifying large amounts of imagery data requires a data team to build what’s known as ‘training data’. This involves manually labeling what’s pictured in these images to help the algorithm understand the difference between land features such as roads, rivers, and buildings. To answer a new question, the Earthrise team has typically had to spend significant time and human resources training a new model each time they are interested in expanding their work to capture a new use case. What if they didn’t have to spend all that time and energy labeling images? What if instead they could develop a search tool that was so flexible, users could enter keywords to search through stacks images that were pre-classified according to image features that enabled them find what they were looking to monitor much, much faster?

READ More:

HOW IT Started:

“When this idea hatched in 2018, AI was still an unproven technology”

Earthrise Media are a team of data scientists and designers that build for environmental organizations. Their first platform, Global Forest Watch, was built to identify deforestation easily from satellites and was adopted by the World Resources Institute in 2012.

The team built web applications for other environmental organizations through the 2010s, including Al Gore’s Climate Trace for high speed tracking of greenhouse gas emissions at scale.

Through building these products, Earthrise Media recognized that about 80% of their development time was spent organizing information and trying to extract concepts from stacks of imagery, rather than focusing on the environmental issue at hand; e.g. plastics waste or environment degradation. This realization refocused the team a critical question:

“What infrastructure do we need to enable someone to quickly identify changes over time over large areas captured in satellite imagery?”

The team set out to build what they refer to as “a large Earth observation model, similar to the popular AI tool ChatGPT, which is a large language model.” Earth Index would draw from stacks of imagery instead of a large body of text. This would enable users to access images which share common features of interest in a more fluid and easy to use manner.


Step 1

Take Sentinel-2 imagery; pre-process with Google Earth Engine

Step 2

Compressing images into embedding vectors with hand-selected features (like NDVI: Normalized Difference Vegetation Index) quantifies vegetation by measuring the difference between near-infrared (which vegetation strongly reflects) and red light (which vegetation absorbs)

Step 3

Build and iterate on user interface

See our full Technical Report >

data tools


Before: human labeling of training data, model development, and result validation.


Unsupervised ML for computer vision; organizing data using the STAC specification (for interoperability) and search embeddings using qdrant which is optimized for similar image search

data Fluency

“This team was far into its data journey – we asked many questions out of our curiosity about what the underlying architecture looked like and ‘why they were querying in this way’ for example”

– Chelsey Walden-Schreiner, Data Scientist, Data Practice at PJMF and foundation technical lead for this project

Lessons Learned

The importance of human-centered design

Human centered design is essential for unlocking the power of embeddings, a powerful tool for distilling the information contained in a remote sensing image.

  • Simple searches over embeddings can quickly yield collections of visually similar images. Embeddings can capture conceptual information useful for human decision making
  • Embeddings inherit limitations from the data sources from which they are derived.

We Can’t Afford everything

  • “We want to, but cannot afford to include everywhere in the world at a high resolution yet”
  • “Our cloud storage bills are through the roof!”
  • “It’s important to understand what costs money to host and call data! And publish the cost explorations for these.”
  • Costs for web hosting vary based on project, see this article for some practical considerations

Local knowledge is critical

“Accurate mapping requires local knowledge – once you have this, you can find things very quickly!”

This tool is not sustainable without community input

Without the community of people who could use it, right now this tool is still for experts only, this would just remain a toy – a toy that we can start to demonstrate the viability of for substantive impact. We were able to get important input from user testing through the Climate + Data accelerator cohort.”

This work requires a lot of up-front capital investment

We overestimated our ability to raise venture philanthropy. Right now, we’re trying to figure out what it looks like to capitalize the project at the scale required to develop for broad use. This is a big model that requires a ton of infrastructure; the hard part is hosting the infrastructure.”


2802 plastic waste sites cataloged

One application of the new tool, a project called ‘Global Plastics Watch’, with support from the Mindaloo Foundation, has cataloged 2,802 plastic waste sites in Indonesia, many of which were previously unknown. 22% of these waste sites are within 250 meters of a waterway and therefore represent a significant risk for contributing to marine plastic waste.

“The flexible applicability of what the team are developing is really exciting! The ability to search imagery that quickly would be a significant technological advancement”
– Chelsey Walden-Schreiner, Data Scientist, Data Practice at PJMF and foundation technical lead for this project



Future Direction:

The team is exploring if they can become a ‘Focused research organization so they can continue developing the tool without taking venture capital, which may push them toward defense applications, diluting their environmental advocacy mission.

“We have an 18-month product vision based off of GPT interactions – need to figure out a way to get users to interact in an iterative way, integrating LLM with our model.”

Contact The Project Teams:

“This work would require a very serious and significant commitment from funders in order for it to remain in the public sector. This tech currently exists for defense only, serving 10,000 people, but not activists. How to keep it public is the challenge. In the next 3-4 months, we need to figure out what the structure is: led by technologists, not environmentalists, and done in a way that’s self-sustaining and can keep the right talent”
Scroll to Top