Our Projects

Agriculture & Food Data Science

Lifting yields with spatial arrangement in agroforestry plots

Audit Data Science Initiatives
Regression & ANOVA

Challenge

Spatial factors are important for crops – how close are shade trees, how widely to fertilize, or how large should plots be, to name a few considerations. But the impacts can be tricky to analyze, and trickier still in an agroforestry context where farm plots can be complex.

Solution

Using spatial statistical methods we unpicked how factors like shade tree species and distance, soils, edge effects, and boundaries between plots all impacted yield quality and quantity, as well as disease.

Applicability

These kinds of analyses help to direct our clients on how to optimize crop arrangement in their agroforestry plots to lift crop yields and suppress disease and pathogens.

Agriculture & Food Data Science

Extreme weather impacts on agrochemical product performance

Regression & ANOVA

Challenge

Agrochemicals deal with all the environmental stressors found in the field. Even the most robust products become less effective in extreme temperatures or periods of plant stress. Pinpointing which conditions are responsible given the complexity of climate and local environments.

Solution

Using only survey data and locations from-end users, we can identify which extreme environmental changes impacted product performance by scraping climate data from the web. Working these models we can find critical thresholds specific to farming regions, such as temperature cutoffs.

Applicability

Such analyses are valuable to product development teams seeking to improve usage recommendations, end-users like farmers or orchardists, and can help avoid unnecessary use of products under poor application conditions.

Agriculture & Food Data Science

Efficient smallholder farm planning with genetic algorithms

Machine Learning Model Validation

Challenge

Smallholder or family farms have limited space and often rely on a diverse base of saleable products. However, getting the best spatial positioning out of all profitable crops is a surprisingly difficult calculation - even if a computer is provided with perfectly accurate data.

Solution

Rather than trying to find the perfect layout, we have developed algorithms in which basic requirements such as crop diversity, annual budgets, and land surface can be used on a per-farm basis. These algorithms use an iterative, evolutionary process to return a range of options to inform planning.

Applicability

These tools can be modified for any range of small-scale farming systems, including organic vegetables, urban farming systems, or intercropping trials. Due to their speed and flexibility, we can customize tools to provide reactive answers in real-time.

Pest & Beneficial Species Management

Web apps for regional pest management

Model Deployment using Cloud Services

Challenge

Newly emerging pests require regionwide management strategies. Predicting the likely locations for spreading invasive pest species is critical for removal efforts.

Solution

Working with collaborators in agricultural extension and integrated pest management, we are developing a data pipeline for analyzing data on newly emerging pest insects. These forecasts are location-specific and provided as web tools to serve professionals in the plant protection industry and government.

Applicability

The cyberinfrastructure surrounding these tools can be applied to any emerging pest or pathogen with enough prior data. Our programming and domain knowledge developed on this project is ready-made for other agricultural or forest systems dealing with newly emerging devastating pests.

Pest & Beneficial Species Management

Predicting pea aphid outbreaks early enough to act

Invasive Species Monitoring

Challenge

Temperatures are often important for predicting pest outbreaks, especially in invertebrate pests. But weather dependence means these models can give too little warning of an outbreak for managers to intervene in time.

Solution

We used long-term climate data and to model insect development based on calendar date rather than temperatures, providing far better forewarning to pest managers.

Applicability

The relationship between temperatures and date is highly site specific. We validated our models with Bayesian methods to ensure confidence in calendar date-based predictions, and have repeated this approach to test suitability in other pest systems.

Pest & Beneficial Species Management

Model validation for the invasive insect emerald ash borer

Machine Learning Model Validation

Challenge

The insect emerald ash borer has ripped through forests in the eastern US, devastating native ash trees at an alarming pace. Current modeling efforts by academic and public scientists are critical for understanding how this invasive species impacts both forests and urban greenspaces.

Solution

We are working with state entomologists to validate spatial models associated with a 10-year dataset collected in the state of Connecticut. Using information gleaned from ecological data maps are being produced to track impacts. To this end, we are providing GIS and statistical modeling consulting.

Applicability

Invasive species are one of the leading causes of biodiversity decline, and they also have significant impacts on economics and human health. Often one of the hardest challenges is just racking the impacts of invasive species, especially if containment measures have not succeeded. Quick turnaround is important for these modeling efforts.

Pest & Beneficial Species Management

Management guidelines for Canada thistle from messy field data

Data Procession. Cleaning, Storage, Management, Regression & ANOVA

Challenge

Real world data is usually flawed and messy, not collected from carefully designed experiments. Answering questions with data that are missing controls or when treatments aren’t applied consistently is a serious challenge even for experienced analysts and requires bespoke solutions.

Solution

We reshaped the data around awkward treatments, enriched it by scraping Google Earth Engine, and used factor analysis include as many climate and geospatial variables as possible. We showed broad correlations between pest management practices and landscape patterns, and reduced these down to guidelines to allocating management effort.

Applicability

Correlation is not as powerful as causation, but with our expertise we can use it to rescue a ‘bad’ dataset. Our clients don’t always have the perfect data to directly answer important questions – but we can get pretty damn close.

Pest & Beneficial Species Management

Ground-truthing general guidelines on local field data in spotted lanternfly

Machine Learning Model Validation

Challenge

Managers can tap into a wealth of pest management knowledge from out-of-state and overseas experiences. But pests adapt to local conditions, such as climate, so it’s important to ground truth any external practices with local data from the field.

Solution

For an insect pest, we developed a statistical method to validate the temperature threshold for egg hatch reported by overseas managers against local outbreak data.

Applicability

We can adapt this approach to ensure that management models built on overseas systems are a good fit for local conditions, so practitioners can confidently take advantage of global knowledge.

Pest & Beneficial Species Management

Evaluating pest management policy against outcomes on the ground

Audit Data Science Initiatives, Structural Equation Modeling

Challenge

How are we doing, and where are we headed? These questions are key in an evolving situation like a new pest outbreak, where management resources are limited and failure could cripple a multibillion dollar industry.

Solution

Relying on our biological expertise, we identified ‘natural experiments’ in the data arising inadvertently from management practices. We’ve used spatial and timeseries statistics to explore how different management decisions impact pest control outcomes, vindicating policies that are working and prompting a re-think for those that aren’t.

Applicability

We take pride in our highly consultative approach, sensitive to the needs of varied stakeholders. Coupled with our technical know-how, we partner with public and quasi-public sector agencies to measure and improve their effectiveness at tackling threats to industries and livelihoods.

Pest & Beneficial Species Management

A machine learning tool for smart disease surveys

Data Procession. Cleaning, Storage, Management, Model Deployment and Cloud Services, Machine Learning Model Validation, Invasive Species Monitoring

Challenge

It’s expensive and difficult to monitor range expansion in a novel disease, especially at the regional level. Surveys need to be informed by risk factors such as climate, human demographics, and transport vectors and corridors.

Solution

Through a framework for broad stakeholder engagement, we’re building a tool to help guide targeted surveys that ensure that new disease outbreaks are quickly and efficiently contained. We use machine learning methods and our web-scraping expertise to integrate an exhaustive array of predictor variables.

Applicability

The resulting tool will deploy via a web app front-end, designed with survey practitioners in mind.

R&D Team Support

Simulating economic outcomes in diversified farms

Model Deployment using Cloud Services

Challenge

Ag professionals are interested in trying new practices to diversify farms. However, given the extremely large number of potential changes to crops and product usage, it's difficult to estimate long-term outcomes.

Solution

We developed a straight-forward tool to simulate 5-year outcomes based on starting conditions. Rather than generating a forecast, this tool serves as a way to look at "what if scenarios" where growers can see a range of potential economic outcomes based on both large and small management decisions.

Applicability

A diversified farm economic outcomes tool will be helpful to illustrate the potential gains and risks when adopting new varieties, rotations, or intercropping strategies. Such a tool will also help organizations marketing to local stakeholders interested in diversified agriculture.

R&D Team Support

Designing efficient experiments with power analysis

Power Analysis & Sample Size

Challenge

Experimentation is critical to successful product R&D, but sample size is expensive – too many samples is a costly waste, but with too few samples results can be inconclusive. In R&D, we want to find the sweet spot to balance statistical power against sample size.

Solution

We’ve used state-of-the-art statistical models to squeeze as much power from our clients’ pilot data as possible. But power analysis with these models is tricky – we use simulations to estimate the sample sizes needed to scale up to full size experiments that can answer their questions using these models.

Applicability

Our approach beats industry staples like the t-test by as much as a factor of 10. A tenth of the sample size means a tenth of the cost, and 10x bang for your buck.

R&D Team Support

Data analytics for sustainability: improving cacao yield with hand pollination

Model Deployment and Cloud Services, Structural Equation Modeling

Challenge

Crops can often be pollination limited. To maximize yields, the correct timing and intensity of pollination services can lead to significant payoffs. In cacao production, hand-pollination can impact the size, number, and quality of fruit simultaneously. We have developed structural equation models to leverage large-scale, long-term data to determine the ideal hand pollination rates for cacao trees.

Solution

We have built a framework for cleaning and analyzing farm data from orchards and other cultivated trees and developed a Python pipeline for fitting statistical models quickly and accurately.

Applicability

The resulting outputs completed by EcoData Technology have been developed into a proprietary web application to meet the needs of end users in the cacao industry.

R&D Team Support

Comprehensive performance evaluation in agrochemical development

Regression & ANOVA

Challenge

Before they launch, a business needs a complete picture of how a new product performs, including against their existing brands. While industry solutions are easy to use, their analyses are broad and can be hard to fine-tune to the quirks of each product.

Solution

We built a data pipeline to extract data stored in a non-standard proprietary format. From there we worked closely with our client to design multiple indicators of performance spanning the whole lifecycle of their new pesticide. We delivered comprehensive analyses on the strength and duration of product effectiveness compared to older formulations, and translated the results into practical guidelines for their customers.

Applicability

We can adapt our pipeline to other industry databases (such as ARM), free your data from these ecosystems, and perform specialized analyses to support every step of new product development.

Structural Equation Modeling
Regression & ANOVA
Power Analysis & Sample Size
Model Deployment using Cloud Services
Machine Learning Model Validation
Invasive Species Monitoring
Data Processing, Cleaning, Storage, Management
Audit Data Science Initiatives