M2fNet: Multi-Modal Forest Monitoring Network on Large-Scale Virtual Dataset

Forest monitoring and education are key to forest protection, education and management, which is an effective way to measure the progress of a country's forest and climate commitments. Due to the lack of a large-scale wild forest monitoring benchmark, the common practice is to train the model o...

Full description

Saved in:

Bibliographic Details
Published in:	2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW) pp. 539 - 543
Main Authors:	Lu, Yawen, Huang, Yunhan, Sun, Su, Zhang, Tansi, Zhang, Xuewen, Fei, Songlin, Chen, Victor
Format:	Conference Proceeding
Language:	English
Published:	IEEE 16.03.2024
Subjects:	Benchmark testing Computer vision Computing methodologies-Artificial intelligence-Computer vision-Image segmentation / Object detection Computing methodologies-Modeling and simulation-Simulation support systems-Simulation environments Forestry Three-dimensional displays Training Vegetation Virtual reality
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Forest monitoring and education are key to forest protection, education and management, which is an effective way to measure the progress of a country's forest and climate commitments. Due to the lack of a large-scale wild forest monitoring benchmark, the common practice is to train the model on a common outdoor benchmark (e.g., KITTI) and evaluate it on real forest datasets (e.g., CanaTree100). However, there is a large domain gap in this setting, which makes the evaluation and deployment difficult. In this paper, we propose a new photorealistic virtual forest dataset and a multimodal transformer-based algorithm for tree detection and instance segmentation. To the best of our knowledge, it is the first time that a multimodal detection and segmentation algorithm is applied to a large-scale forest scenes. We believe that the proposed dataset and method will inspire the simulation, computer vision, education and forestry communities towards a more comprehensive multi-modal understanding.
DOI:	10.1109/VRW62533.2024.00104