Open Contracting Partnership
Intelligent Data Ecosystem in Assam – Flood Response and Management (IDEA-FRM)
Vision: IDEA-FRM is an intelligent data model created by Open Contracting Partnership (OCP) & CivicDataLab (CDL) that enables decision-makers to improve disaster response and relief procurement to protect the poorest and most vulnerable from the worst effects of floods.
Records Digitization, Tracking and Transparency, Data-driven Decision Making
“Good data on how much is spent and in which communities is critical to understanding whether these investments reach the areas most affected by floods. Better data is fundamental for creating effective tools to manage disasters and reduce risks.”
High quality geolocated government spending data is necessary to understand whether intended investments reach areas most affected by floods. In Assam, India, this data is siloed and scattered across different online and offline platforms, and cannot be easily accessed, analyzed, and used to inform decision-making by responsible government agencies.
Geo-spatial and satellite data to assess exposure, severity of propensity of floods in different areas of the state.
Socio- economic data to assess the vulnerability and resilience capacity of different regions and communities
Fiscal data, such as public contracts and spending to assess past actions and investments against floods and inform future requirements.
Vulnerability Mapping, Data Modeling, Open Source
Policy Makers, Disaster Managers, Humanitarians
Engaging with target users
“We had regular meetings with the target user, which helped a lot, as they gave us inputs we may have missed as well as requirements. The theory was very important, and we needed to get it right. A key for us is to make sure that our models are used, and for a specific purpose. Beauty of this work was that from the start we had input and they had a vested interest. We built around their needs.”
Primary Members of the Team
- Data Engineer
- Geospatial Data Scientist
- Public Finance Researcher
- Project Lead & Government Relations Coordinator
- Consulting support from Data Scientist
The partnership capitalized on each group’s strengths:
- OCP: experienced in government procurement data
- CDL: experienced in geospatial data coding, data engineering
Vulnerability assessments and climate work were new to both teams
Combining fiscal, socioeconomic and geospatial datasets will provide insights into whether government response to floods are reaching areas of greatest need in a timely manner, and lead to a more complete understanding of flood risk and impacts, guiding future preparedness and response actions. An intelligent data model can ensure the interoperability of freely available datasets for evidence-based decision-making in disaster preparedness and response.
“There was a feeling very early on that together we had hit on something promising here”
nearly 800 million people live in high flood risk areas globally
The World Resources Institute estimates that nearly 800 million people live in high flood risk areas globally. Located in the shadow of the Himalayas, the Indian state of Assam is one of the world’s most flood prone areas, with nearly half of its territory prone to annual inundation. Flooding there in 2022 caused nearly 200 deaths and impacted over 8 million people.
Flooding’s potential to trigger multiple global humanitarian crises is exacerbated by our changing climate.
A deluge from an atmospheric river or a typhoon strikes quickly and often with unforeseen magnitude. A swift and targeted government response is the best hope to avoid unnecessary death, displacement, and catastrophic infrastructure damage from these disasters. Recently adopted global frameworks underscore the importance of country and city commitments to data-informed disaster risk reduction strategies and implementation of policies to reduce vulnerabilities and improve climate resilience.
Technology to inform better risk reduction strategies is unavailable in much of the global south
Unfortunately, high quality, machine-readable and interoperable data to inform better risk reduction strategies is unavailable in much of the global south that are most prone to floods, including in Assam. These data issues and data gaps can lead to ine fficiencies in resource allocation in places with already stretched budgets, resulting in adoption of bad policies or ad-hoc responses that fail to address urgent needs during emergencies.
Determined to address these issues where the problems were acutely felt, the team at CivicDataLab called upon pre-existing trusted relationships with the government in Assam, and connected with both the state’s finance department and Assam State Disaster Management Authority (ASDMA). This helped the team to better understand the challenges and gain the confidence and buy-in required to address them. They set out to gather and combine data and use it to develop tools to help these decision makers do a better job preparing for disasters.
HOW IT Started:
The partner teams had worked together prior to joining the Data Practice’s Data to Drive Climate Action Accelerator cohort at PJMF, through an innovation challenge that Open Contracting Partnership (OCP) sponsors. CivicDataLab (CDL) first submitted a proposal to build IDEA-FRM, the intelligent data model, to address impacts from flooding in Assam and went on to earn the top prize.
“At CivicDataLab all projects are about making public data more transparent, accessible, and usable.”
The project operated at the intersections of procurement, climate resilience, equity, and data, all closely aligned with the mission of OCP to make public financing fair and efficient. The partners saw potential to scale this approach beyond the first use case in Assam, and an opportunity to showcase the work in Assam as a demonstration project and expand its impact if adopted elsewhere.
“There was a feeling very early on that together we had hit on something promising here”
At first it was a struggle for the team to acquaint themselves with some of the technology required to build a functional and useful data model – they had never built one before – let alone make one that was intuitive, accessible and trusted by authorities.
Together with advisors, the team identified a primary target user to test and implement the model, the CEO of the Assam State Disaster Management Authority (ASDMA). Fortunately, he was already technically savvy and curious to learn about how the new decision support tools alongside the team in an iterative, co-designed approach.
“We moved forward incrementally, and did not begin the engagement with our proposed data solutions. We began by trying to understand the ecosystems, potential use cases, and then came to a workable solution together. That approach allowed our users to grow with us, instead of being overwhelmed by jargon.”
The team identified relevant flood related data in five broad categories:
- Satellite and weather data: To understand floods as a function of various natural factors like rainfall trends, distance to rivers, elevation, slope, drainage density, vegetation density, built density, soil and lithology;
- Demographic data: To understand how floods interact with settlements and determine the impact on human lives and livelihoods looking at various social and economic factors;
- Access to infrastructure: To understand the vulnerability of regions as a function of infrastructure access to cope with floods;
- Damages: To understand the trends of flood impacts in the regions historically
- Government response: To understand how the government has responded to floods and where the gaps might be through public procurement data.
All the raw data from the five identified sources were cleaned, formatted and transformed for ingestion into the analytics platform. This process included consolidating or in certain cases, separating fields and columns, changing formats, assigning unique identifiers, deleting unnecessary data and making corrections to data. This data is now available to the public on Github – IDEA-FRM Repository.
Most flood models are based on predictive modeling, typically from satellite and weather data. The team’s work was also around public procurement and how it relates to infrastructure, loss and damages. The first three months of work were spent mapping flows, ownerships, categories of data, which were collated by different agencies in different formats. They needed to understand the data ecosystem and ensure interoperability.
“There was demand from stakeholders for a flat prediction model. We tried different ML techniques to see how this would work. A stakeholder meeting with Assam had identified a need to better predict where funds should be distributed based on risk and vulnerability.”
“A flat prediction model, also known as a flat model or a shallow model, refers to a machine learning model that consists of a single layer of neurons or a linear equation. These models are relatively simple and have limited capacity to learn complex patterns or relationships in data.”
Unlike deep learning models, such as deep neural networks, which have multiple hidden layers, a flat prediction model does not have the ability to automatically extract hierarchical features or representations from the input data. It typically operates by directly mapping input features to output predictions using a linear equation or a simple mathematical function.
Examples of flat prediction models include linear regression, logistic regression, support vector machines (SVMs), and decision trees. These models are often used in scenarios where the data is relatively simple and the relationships between input features and output predictions are straightforward.
While flat prediction models may not offer the same level of predictive performance as deep learning models on complex tasks, they have the advantage of being computationally efficient, interpretable, and easier to train and deploy. Their simplicity makes them suitable for certain applications where a complex model is not necessary or where interpretability is crucial.
Tesseract, Open Computer Vision, Cheyyali, NER
“We identified flood-related procurements using keywords as well as sorting by the procuring agency/ departments. We are converting tender documents to machine readable formats using Tesseract and open Computer Vision (CV) on which we are then annotating to identify patterns of information using Cheyyali, an open-source annotation tool that can be extracted by training an Natural Language Processing models like Name Entity Recognition (NER) and Entity Relationship model. We have also started geo spatializing these tenders and have added to this dataset the offline procurements done in emergency for floods for our pilot districts. After listing the sources, we classified the datasets based on their frequency of updation/ release. Each frequency bucket is handled differently to source the data. For the low frequency we ran one time scrapers, whereas for high frequency datasets (which get updated in less than a month’s time) we are developing end to end pipelines using python and R.”
“A lot of different things have opened up for us now. We had not ventured before into dynamic modeling and ML techniques. It increased our whole data playing field!”
“The accelerator’s flexibility for experimentation was very useful. The support helped test boundaries of what is realistic and viable much like a peer review.”
“The technology looked very inaccessible to us a year ago, and now it doesn’t”
multidimensional model framework
Flood Risk Assessment Model, integrating a machine learning model to predict the probability of flood occurrences and weighing it against vulnerability and access to infrastructure (coping capacity) to assess risk.
Flood Preparedness Model, employing a statistical multivariate model to assess the preparedness to floods by combining all the datasets to identify the places which need to be better prepared to face floods. This employed ‘Structural Equation Modelling (SEM), which is a multivariate, hypothesis-driven technique used to assess structural relationships.
“Initially we worked with prediction modeling, but factors that define floods were unpredictable, erratic. Assam is close to international borders. A dam opening in China that causes a flood downstream cannot be predicted. Despite making a great prediction model, we realized that some of the other flood models have used 20-30 years of private training data. Because we operate outside of the government, we are trying to limit ourselves to the use of open tools. Even if we had the best data, it wouldn’t have made a difference. In addition, for many of the flood models, risk indices are driven by expert opinion. Every researcher has their own rationale for vulnerability and they’re all different and subjective.”
“We asked ourselves ‘How can modeling inform extent and absorb subjectivity?’ The technique of SEM had never before been applied to this use case, so we set out to try it ourselves. Ultimately it allowed us to move away from the subjective bias of individuals.”
“The benefit of using an SEM over other multivariate techniques is that SEM gives us the opportunity to construct unobservable latent variables and predict their values. After the model is fit, we can predict values of each of the latent variables from the input data. While this can be done without SEM by manually giving weightages to each of the variables, SEM generates these weightages from the data and relationships between variables itselves. At the same time, SEM allows for more explainability between the variables than a prediction based Machine Learning Model.”
The team’s final technical project report includes their model outputs and tactical frameworks
Government data is often siloed and scattered across different online platforms.
“In our field research, we realized that data may not even be available electronically. What data is available is not ready for processing in inaccessible formats, non-machine readable, and often hidden behind captchas.”
Administrative boundary changes can also make it difficult to track changes
The spatial extent and resolution of the maps had to be standardized to match that of the study area.
Not all datasets used the same coordinate reference system and we had to reproject references as part of preparing the data for analysis.
A history of collaboration with agencies that play a crucial role in responding to disasters.
“Our existing relationships, built over time, have been instrumental in driving progress. Our prior engagement with the government of Assam built significant goodwill and trust which in turn, incentivized officials to participate in our project activities. Working with the Finance Department enabled us to get consolidated procurement data of previous 5 years which is not easily accessible to the public. This also gave us credibility and helped us reach new stakeholder groups involved in or otherwise affected by floods to enrich our understanding and inform our data model, ensuring that it reflects the on ground realities and needs.”
A large number of emergency procurements from relief materials happen offline.
The team accessed this data for their pilot district following interviews with government stakeholders. Understanding process and pain points helped them to scale up the project and analyze the outcomes.
Capacity issues exist, government users need training and an intuitive user interface.
Governments do not always have capacity to run models, and because of this, the team is trying to productize the data model that can also be run in 2 other states, Orissa and Himachal Pradesh, and internationally.
ASDMA tested the data model in their decision making process before the last flood season.
A preliminary correlation analysis revealed that 10 new projects were approved in 10 of the most vulnerable counties that were prioritized by IDEA-FRM
They found significant efficiency gains from the model; to do similar work would require thousands of volunteers to collect data over 4 months. IDEA-FRM reduces this effort to 2-3 people over the course of 3-4 days. Model outputs are shared in this CDL blog post.
The team believes that this is the first time procurement data has been studied comprehensively in the context of disaster management efficiency.
“The replication potential of IDEA-FRM is very exciting, and this could work in different places. New places in flood prone states, Thailand wants to replicate. We don’t have this yes, but if and when replicated this could be hugely successful. Everything designed to be open sourced and replicable.”
“This new project was a great opportunity to venture into a new domain. We had been working on this for a while, and our journey has not stopped. The Patrick J. McGovern Foundation really did its job to accelerate our work”