Biologie | Umwelt
Ritik Singhal, 2007 | Reinach, BL
Deforestation is one of the most pressing issues of our times in regard to biodiversity loss and is one of the many impacts of human activity on the biosphere and organisms around us. The Congo Rainforest is the second-largest rainforest in the world (behind the Amazon) and is home to much of this deforestation, but does not have enforceable policies to curb this illegal logging. Thus, models to predict the location of future deforestation would be of interest to governments and NGOs around the world. This paper proposes and evaluates five models for this purpose. The Temporal Convolutional Network (TCN) outperformed the rest with AUC values ranging from 0.892-0.982 and F1 values of 0.811-0.907 for the years 2020-2022 after being trained on data from 2000-2019. This network also increased the spatial resolution by a factor of >200 compared to current state-of-the-art networks and uses only free, publicly available data, making it highly adaptive to transfer learning for different regions globally.
Introduction
(I) Question 1. To what extent can convolutional neural networks (CNNs) predict deforestation successfully only by using easily accessible and publicly available data (i.e. not utilizing localized indicators), when trained on data from 2000-2019 and tested between 2020-2022? (II) Question 2. To what degree can the state-of-the-art (SOTA) spatial accuracy, and thereby predictive accuracy, be improved upon?
Methods
The deep learning models used were trained on three datasets in Python (ver. 3.9). The first dataset was LandSat 7 ETM+ C2L2 satellite imagery; secondly, topographical information from NASA’s Digital Terrain Model; and lastly, the Global Forest Change (GFC) dataset by the University of Maryland. The GFC was an augmentation to the satellite data, which gave information as to if/when each individual pixel had been deforested, allowing temporal patterns to be more easily identified. In total, five models were investigated: Random Forests (RF), 2D CNNs, 3D CNNs, Long Short-Term Memory (LSTM) based CNNs, and TCNs. RFs were used as baseline models and were accessed using SKLearn, while CNNs were retrieved using PyTorch. Google Computing and Amazon Web Services were used for these computations.
Results
For each test year (2020-22), 20 trials were conducted per model, and their Area Under the Receiver Operating Characteristic Curve (AUC), Precision, Recall, and F1 values were calculated. The TCN outperformed the rest, with an AUC value of 0.982 and F1 score of 0.907 when predicting 2020, and decreasing to 0.892 and 0.811 for 2021 and 2022, while the baseline RF model performed the worst, with AUCs of 0.675-0.814 between 2020 and 2022. This steep drop in performance from 2020 to 2022 is mirrored in the 2D CNN, despite performing much better than the RFs. Lastly, although performing similarly for the test set in 2020, the LSTM-based CNN outperforms the 3D CNN in later years with F1 values of 0.800 and 0.766 respectively.
Discussion
The fact that the TCN performed the best across all years, especially in 2022 despite missing some data, suggests that deforestation has a certain amount of inertia to it; if it begins in some place with directionality, it is likely to continue for the next few years. Also, the assumption that the forest was in quasi-equilibrium (i.e., that deforestation patterns in 2000-2019 accurately reflect those in 2020-2022) was confirmed, given the high degree of successful predictions by all CNNs. Although (I) the models successfully predicted deforestation without the use of localized indicators, and (II) the SOTA spatial accuracy was improved by a factor of 225, the mechanisms to explain why these models were so successful — and why some were more successful than others — are still unclear. A natural next step would be to conduct a full LIME analysis to uncover their exact motivations. This could also be insightful and further our ability to combat the harmful effects of deforestation.
Conclusions
This study shows that patterns of deforestation can be accurately predicted using easily accessible data. Compared to previously published work, the CNNs developed in this paper provided two main benefits. Firstly, they successfully predicted deforestation patterns without the use of on-the-ground measurements/localized indicators; and secondly, they did so at a more local level, with predictions at a granularity of a few hundred square meters, not square kilometers. Also, this model can be applied to predict deforestation in any other part of the world using the same data – however, more work is needed to demonstrate this applicability globally.
Würdigung durch den Experten
Christopher Johnson
Ritik hat einen Modellierungsansatz ausgearbeitet, um mittels Deep-Learning-Algorithmen die Regenwaldabholzung im Kongobecken vorherzusagen. Weiter wurde die Effektivität verschiedener Algorithmen bei der Vorhersage der Entwaldung im Laufe der Zeit verglichen. Das beste Modell erreichte eine Genauigkeit, die vergleichbar ist mit bekannten Spitzenmodellen, jedoch bei einer viel feineren räumlichen Auflösung–was für Naturschutz- und Politik relevanter ist. Zudem sollte sein Ansatz auch viel flexibler und auf Wälder weltweit anwendbar sein, da nur öffentlich verfügbare Daten verwendet wurden.
Prädikat:
hervorragend
Sonderpreis «Regeneron International Science and Engineering Fair (ISEF)» gestiftet von der Gebauer Stiftung
International School Basel, Reinach
Lehrerin: Paula Rowlands