Pecan Street Incorporated

Vision: Apply Machine Learning techniques to improve soil carbon content estimates and promote greater use of regenerative farming practices. Provide a computationally efficient method for simple “what-if?” type tools for farmers making decisions about what to plant and when to maximize soil carbon sequestration.

Read about Regenerative Agriculture from the NRDC >

Project Overview

Data Challenge

Identify potential pathways to reduce and optimize sampling with model estimations that improve soil carbon measurement and verification

Data needs

Data practices

Predictive modeling

target users


tech team

In-house expertise
Self-taught skills
Outsourced technical partner
More about the team >

PSI Project TEAM

2-Person Tech Team

1 Data Scientist
1 Data Engineer



Statement: Applying Artificial Intelligence approaches to soil carbon modeling can become a reliable, low-cost tool for estimation of soil carbon sequestration and so improve the market utility of these approaches; an Artificial Neural Network (ANN) proxy model could unlock efficiencies for agricultural users seeking to run what-if scenarios.


Research through the PJMF Accelerator was not a slam dunk

“The Patrick J. McGovern Foundation (PJMF) Accelerator created an exciting opportunity for Pecan Street’s data team to develop foundational skills in using AI/ML techniques and expand our experience with Amazon Web Services (AWS). With the advice and support of Chelsey Walden-Schreiner and others on the PJMF team, we built our first artificial neural network (ANN)!”
– PSI Team


Soil Carbon Capture . . .

Soil carbon capture has become a hot topic in recent years as countries seek to fulfill their commitments in the Paris agreement of reducing emissions of the atmosphere’s most abundant greenhouse gas. Soil does not simply trap carbon by itself. It needs help from chlorophyll, the intracellular engine that drives all plant-life on earth, to capture atmospheric CO2 and convert that carbon dioxide into complex sugars that feed plant growth. Over time, as plants die and decay, molecules containing that captured carbon in their roots, stems and leaves make their way back into the soil, buried underground.

The impacts of farming on soil carbon retention . . .

Left alone, landscapes can sequester massive amounts of atmospheric carbon in their soil each year, becoming evergreen carbon sinks as long as they remain untouched. Food production for a growing population means that a significant proportion of land cannot be just left alone. Modern farming practices have a significant impact on soil’s ability to retain that captured carbon. As farmers till soil with their tractors before planting crops, pockets of air form deep underground, providing the conditions for microbes to consume the decaying plant matter, which in turn emit that captured carbon as CO2 back into the atmosphere. Some estimates indicate that soil-carbon emissions account for up to 20% of yearly human induced greenhouse gas emissions globally. For some perspective, that’s more emissions than all the vehicles on earth emit annually.

Adoption of new farming practices to reduce Carbon emissions . . .

Pioneering regenerative farmers, with support from climate-friendly policy and companies looking to offset their carbon footprints, are adopting practices that help keep more of the carbon in their fields from ending up back in our air. Low- or No-till agriculture, in combination with planting certain cover crops can make a huge difference in how much carbon is kept underground. Yet it’s extremely difficult for a farmer to understand exactly how much carbon is sequestered in their fields with any degree of accuracy. Farmers face complex decisions on what and when to plant to increase carbon storage in fields while keeping production yields high. Finding an answer to the soil carbon capture equation is becoming increasingly important as carbon credit markets develop and farmers have new monetary incentives from companies and governments ready to pay them to offset emissions by the metric ton.

Current decision-making tools are lacking . . .

Not enough user-friendly tools exist to help farmers make these tradeoffs and communicate the environmental benefits of their choices confidently and accurately. The team at Pecan Street Inc. sought to create a tool that helps farmers model ‘what-if’ scenarios that can aid in decision-making process and help them understand just how much carbon they are capturing in their fields and what they can do to keep it in the ground.

READ More:

HOW IT Started:

“The work was so interesting, I’d sit and wait for results and try to tune data and then I would hear my wife behind me and I didn’t even know she was home yet … This was one of the more fun projects i’ve worked on the past couple years”

Pecan Street wanted to improved the Decision Support System for Agrotechnology (DSSAT) by establishing the sensitivity of each of the approximately 140 inputs and providing viable recommendations to fill data gaps

The team set out to understand the sensitivity of the industry standard, computationally intensive DayCENT model to inputs (which users need specialized compute knowledge to run).

In reviewing how this model worked, and what resources and knowledge it took to run, the team determined quickly that that a small farmer would never be able to use this at their kitchen table.

The team wondered if it would be possible to create a proxy model with just 10 inputs that might give them a useful result that helps farmers make better decisions to maximize soil carbon storage.

The project plan was well aligned with the PSI mission of driving carbon emission reductions technologies, by identifying missing datasets and trying to fill gaps. There was enthusiastic support from their board, as it seemed like the perfect opportunity to ‘do AI stuff’ for the first time.



Studying the underlying agronomy and model and choosing variables


Determining tools for use and development of variable distribution/sampling


Setting up a compute environment for the computationally intensive DayCENT model
  • Running hundreds of thousands of DayCENT jobs in the AWS cloud to generate our training data
  • Scikit-optimize Python library and Jupyter notebooks to build our compute environment


Vetting the selected tools for creation and development of the AI/ML approaches . . .
  • The team dug into the math and realizing the potential of Latin Hypercube sampling to fully activate the model and ensure equal coverage at the extremes and center of the variation of the variables
  • Experimented with different optimizers (i.e., SGD, ADAM and SGD with momentum), activation functions for the hidden layers (Tanh and ReLU), and scalers (MinMax PowerTransformer and RobustScaler)
  • Explored PyTorch, Tensorflow, and Numpy arrays to generate tensor output files for import into the ANN training software. The ultimate decision was to use Pytorch.

data tools

before the program:

The PSI team were already experienced with ‘Big Data’, and fluent in the language of supercomputing, but the team had not worked with Machine Learning.

After the program:

Successfully deployed an Artificial Neural Network

data Fluency

 The PSI team’s big data fluency was good already, what changed is their analysis fluency. Before, when reading papers across sectors they had to trust what they were learning on ML teams. Now when they get to those sections they can make good judgment on what the ML techniques were and if they are a good choice of technique.

“It’s like I got a new module with my BS detector!”

Lessons Learned

Creating an effective proxy of the complex DayCENT model was challenging

The processing speed of an ANN is extraordinary. It’s possible to explore what-if scenarios millions of times faster than it would be when running the native DayCENT model.

Learned a lot about running jobs in AWS

“we had run docker before, in a more manual process. Building the docker container was a pain, but really fun”

Fail early, often, and just get over it!

Failing means you need to pivot”

Not all data explorations produce the hoped-for results

This project was pure research, and with that work sometimes you’re not going to get to answer you think you’ll get to. Not all data explorations produce the hoped-for results, especially when trying to apply AI/ML techniques for the first time.

Be cautious in overinterpreting results!

Promising results experienced with a simple ANN begin to break down under closer scrutiny and as the number and complexity of inputs and hidden layers increases.

Data was not always correct for various toolsets

Data was not always correct for various toolsets in the early trials and PSI ran into scaling input/output variable errors.

Don’t make assumptions

Perhaps we were mistaken in our original assumption that DayCENT or other big agricultural models can be treated as a black box for analysis like this.

We can’t build soil carbon sensors

The simplest theoretical solution, building soil carbon sensors, was not possible.

Tough to build trust in outputs

Trust in results was a major concern the whole time! Sometimes we looked at input, output histograms and said ‘i dont buy it’

impact metrics

Accelerating the adoption of regenerative practices requires reliable low-cost soil carbon monitoring and verification (M&V). Pecan Street’s ultimate goal is to use the research outputs to lower the cost of M&V, which will allow more farmers to participate in programs to help them transition to regenerative  practices.

“We included some of what we’ve learned on proposals we’re working on now – and we have no fear in proposing some of these and other similar techniques.”
“The Patrick J. McGovern Climate + Data Accelerator was a great program -the way they organized, they let you flounder for a while and then pull you back out of the ditch. I’d do it again in a heartbeat if i could”
-PSI Team



Future Direction:

There are currently no users for their product, but the team is committed to troubleshooting the model.

  • PSI is still trying to understand where major issues are coming from
    • Looking at methods for error collection in our data – we have a simple rule based collection
      • Data analyst is looking for errors each week
  • The team have not built another ANN yet from scratch on other datasets, instead have have started looking at putting together a Convolutional Neural Network with a larger version and different training assumptions, which is giving them hints at better results.

Case Studies >

Scroll to Top