The National Agricultural Image Repository (NAIR) is a large-scale, multi-state effort to build and share weed and crop images, to train artificial intelligence (AI).
The repository will be open-access, meaning publicly available – the first step in faster and cheaper development of precision ag solutions. How? Simply put, open access to a resource like this repository makes it less likely for one group to monopolize the technology and increases the number of eyes working on a problem.
Ultimately, an open-access image repository can advance the development of “smart” precision technologies that can increase efficiency, lower costs, and improve the overall sustainability of farming practices, while also promoting open and collaborative research.
Get Big or Get Out
Datasets for AI need to be big.
The more images that machine learning algorithms are exposed to, the better they become at recognizing patterns and making predictions. For example, if an algorithm is trained on a small dataset of Palmer amaranth and cotton images, it may not be able to identify the difference between the weeds and the crop. But if a large dataset that contains diverse examples of these two species – such as different appearances, times of day, growth stages, and stress symptoms – is used for training, the algorithm is more likely to make accurate identifications. Important weeds are correctly identified for spraying, resulting in more reliable management and better yield.
Large image datasets provide a greater level of diversity, which is important for algorithms to learn to generalize and handle variations in the images. This is especially important in complex domains like agriculture and weed recognition, where variations in soil, weather, and other environmental factors can impact the growth of crops and the appearance of weeds.
How much data do we need to collect? The answer is… a lot!
Actually, that’s a trick question. It implies some static concrete number and end date. A better question is, “How can we continuously capture data?” Millions of images are needed, captured across different backdrops, conditions, growth stages, species, and an ever-growing list of situations. It’s a dynamic process that never ends since these datasets have to be inspected, updated, maintained, and cleaned. The image dataset has to be balanced and structural imperfections have to be identified and corrected. This means adding, removing, or reshaping.
Developing the dataset is like a large ball of clay that’s continuously massaged and shaped into a vase. Except, unlike a vase, weeds are constantly evolving to outsmart whatever management strategy humans use and will continue to do so, even with AI.
Why Ag Needs This Resource
Despite how important these large image datasets are, few exist for agriculture. Publicly available datasets in ag today are limited, developed mostly by small research groups and comprising only one to two thousand images and a couple of species.
There’s a good reason for this: Developing these datasets is not easy. It’s expensive, time consuming, and tedious to capture, label and update all the images needed. Often only massive companies like Google, Microsoft, and Youtube can spend the hundreds of millions of dollars necessary to capture and label images. Many of these datasets have been made publicly available such as ImageNet (~128 million images)1, Microsoft’s COCO (328,000 images)2, and Google’s Open Images Dataset (~9 million images)3.
Publicly available datasets that democratize access to high-quality images allow for more widespread use and rapid development of “smart” technologies, software applications that use artificial intelligence and machine learning to perform tasks that would typically require human-level intelligence. When more people can use the data, we all benefit. But when companies silo data behind closed doors, they stall progress for others and themselves. This has been the case for precision agriculture where an absence of large datasets has become a major bottleneck for advancing artificial intelligence, particularly in weed recognition and precision management applications.
Click here to see more...