The Machine Learning Behind the Biome App

January 23, 2023

The Biome app leverages state-of-the-art computer vision and regression methods. Our team is developing rigorous models to considerably advance forest inventory practices.

Biome is an iOS application for measuring trees in the field and creating forest inventories– an essential component of carbon accounting. Smartphones are well suited for capturing tree statistics such as diameter, height, and species, along with geolocation. In addition to recording these measurements, Biome organizes the data in a convenient format for researchers and technicians. While still under development, Biome represents a significant advancement in forest inventory practices, which currently require manual measurements using measuring and flagging tapes, stakes, clinometers, and notebooks. By digitizing this process, Biome can standardize and scale the inventorying of forests. Traditional forest inventories can be prone to user error, data loss, and the methods may vary in different regions of the world. Creating a mobile application solution that tackles these challenges will not only allow for larger, more standardized datasets, but it will also allow untrained users to contribute to forest restoration projects.

Measuring DBH

In a typical forest inventory, a field worker uses a measuring tape to measure the diameter of a tree at a height of 1.3 meters. Scientists refer to this as "diameter at breast height” or “DBH” as it represents a diameter measurement of the tree trunk taken at the height of an average human’s chest. With the Biome app, capturing a DBH measurement is as easy as taking a picture of a tree. This represents a significant improvement over traditional methods. A measuring tape is cumbersome and may require two people for large trees. The surface of trees may have spikes and some are home to insects such as venomous ants, which means that not needing to touch them or get close to them is a significant advantage. Biome’s biggest advancement is probably its speed. During our recent field trip to Panama, our team practiced measuring sample plots of 5 and 10 meters in diameter. We split into two teams. The first was a team of three using traditional methods and the second was a single person using the application. From this simple experiment, it quickly became clear that the Biome app was approximately five times faster than three people taking manual measurements for the same plot.
This palm tree with large spikes is extremely difficult to measure with DBH tape.
Measuring this spikey palm tree with Biome is easy. The segmentation does a good job even with the irregular trunk.
Our training set includes "false positives", like this pole on a sidewalk.
We used for batch labeling, and then augmented the dataset for a result of 3030 pairs (img+mask). We store the datasets on Activeloop.

How is that possible?

We have two machine learning models that make this possible: a binary semantic segmentation model and a regression model.

A binary semantic segmentation model is used to localize the trunk at a pixel level on the camera image, allowing us to determine the width of the trunk. Our segmentation model is robust enough to recognize all types of trees, even oddly shaped ones that are common in the dense jungle environments where carbon projects take place. Many trees have bent trunks, spiky trunks, and vines growing around them, so we developed our AI model for a diverse selection of trees. 
In addition, our model was specifically trained to avoid “false positives”. When the user takes a picture of a post, a bottle, or a lamp, it won’t be detected as a tree. The model also focuses on a single tree in the foreground. This is important for the measurement process which can be hindered by multiple trees being grouped together. 
After several iterations, the segmentation performs well for a wide range of tree trunks, including trunks with vines, thorns, or spikes.
Biome also uses a regression model trained to correlate the diameter of a tree with the pixel width of the segmentation mask as well as the distance of the phone’s camera to the tree trunk. We are able to calculate the distance easily using the Lidar sensor on the most recent iPhones. The average error of our model in our test set was 1.66cm. Our test set was created by measuring several trees’ DBH with a DBH tape. We made sure to have a wide range of diameters (5-140cm) in the set in order to test a range of widths.
This graph shows the differences between Biome calculated DBH values versus manual measurements taken of the same trees. The goal here is to have the DBH measured with Biome as close as possible to the DBH measured by hand (green line). The closer the data points are to this line, the smaller our error is over traditional methods. In this case, the measurements with the Biome app match the DBH measured by hand reasonably well, except for the highest data point, which was measured manually at 140 cm while the Biome app estimated the diameter at ~118 cm. 

The error for very small and very large trees represents the operating range we initially assumed when gathering data. To improve the model, we plan to gather data outside the 10-100 cm range. Of course, there are many useful applications for measuring trees outside of this range, for example: measuring small saplings in the first few years of growth.
Example of two trees’ (DBH =41 cm and DBH = 5 cm) RGB images along with their corresponding depth maps.
We are constantly exploring new methods to capture data. Depth maps generated by iPhones with Lidar capabilities also open up a new range of possibilities. Allowing us to potentially use a neural network to calculate DBH in a single step, as opposed to the two-step process described above. 

Tree Species Classification

While the efficient measurement of tree diameters is essential, we also need to measure other critical variables. Classifying tree species correctly allows us to utilize tree-specific allometric equations and create more accurate measurements of biomass. We ultimately hope to reliably identify tree species using the Biome app. To solve this tree species classification challenge, we are currently working with local experts to create tree species datasets we can use to train AI models to identify tree species. The model we are currently experimenting with is based on a computer vision method called image similarity. Image similarity models are trained to identify the top N similar images from the tree species dataset on the input provided by the user.

We are also running experiments to determine which parts of the tree provide the highest predictive power for our models. Trees have many distinct parts that are useful for classification, yet traditional approaches have relied solely on leaf or bark images. We believe this is the main bottleneck to solving this classification problem; getting access to clean standardized data in such an unstructured environment. Our botanists in the field take images of trunks, leaves, canopies, roots, fruits, and flowers and we are in the process of determining the correct mix to boost classification accuracy. The graph below shows the single image accuracy of our model on the test set we gathered in the Madre de Dios region of Peru. 
Graph of the percentages of correctly classified images per type and species: x axis represents different species and y axis is the percentage.
For the species Brosimum Alicastrum we can see the bark images are always misclassified while the sap pictures are classified correctly 100% of the time. For this species, the sap pictures are much more useful in determining classification accuracy. For the species Celtis Shippii, we can see that the bark images produce a classification accuracy of 84% on the test set. We think this information will be critical for determining which parts of the tree are most useful for classification. Local expert botanists will be able to use this information as they help us build our species datasets and we prioritize which parts of the tree are most relevant for our models. It also helps us understand the performance of our models and gives us a better understanding of their inner workings so we can continue to improve our species classification results. 

The first species classification model we experimented with was implemented using single images from different parts of the trees. The model saw pictures from 5 different trees for each species during the training phase. To make sure the model worked in real-world scenarios, we created another set to test its accuracy on trees it had never seen before. This test set consisted of 3 different trees for each species. We tested our first model on this test set using all single images (bark, branching, canopy, flower, fruit, leaf, profile, root, sap, and branch): the model predicted a species for each one of the images and performed quite well.
Above is a confusion matrix that shows the normalized scores of our results. The y-axis represents the real species and the x-axis represents the model’s predicted species. When all images from a species are correctly identified, the score is equal to 1. A perfect model would be represented by all black squares except the ones going from top left to bottom right. Those should be white to represent the model predicting the correct label. In this case, a bright diagonal line and darker outer squares represent higher accuracy. We observe here that the best results are for the species Iriartea Deltoidea with a normalized score of 0.77: this means that 77% of the Iriartea Deltoidea images were correctly classified. Indeed, it has the brightest square across the diagonal and the darkest squares going horizontally and vertically from it.

Overall, when using images from different parts of the trees, our model is able to correctly find the right species in 48% of the test images. Our next step is to group tree images together so the model has more information to correctly classify the species.  Currently, we are only using the leaf picture to identify the species,  but once we start to use a combination of leaf, trunk, root, fruit, flower, and profile images of the tree,  we should start seeing better results.

Summary and Next Steps

Our team is developing rigorous models to considerably advance forest inventory practices. We have implemented automated methods for each one of the steps described here making our solutions scalable to the rest of the world while being local and precise thanks to our partners on the ground.