M2fNet: Multi-Modal Forest Monitoring Network on Large-Scale Virtual Dataset

Forest monitoring and education are key to forest protection, education and management, which is an effective way to measure the progress of a country's forest and climate commitments. Due to the lack of a large-scale wild forest monitoring benchmark, the common practice is to train the model o...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW) s. 539 - 543
Hlavní autoři:	Lu, Yawen, Huang, Yunhan, Sun, Su, Zhang, Tansi, Zhang, Xuewen, Fei, Songlin, Chen, Victor
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 16.03.2024
Témata:	Benchmark testing Computer vision Computing methodologies-Artificial intelligence-Computer vision-Image segmentation / Object detection Computing methodologies-Modeling and simulation-Simulation support systems-Simulation environments Forestry Three-dimensional displays Training Vegetation Virtual reality
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Forest monitoring and education are key to forest protection, education and management, which is an effective way to measure the progress of a country's forest and climate commitments. Due to the lack of a large-scale wild forest monitoring benchmark, the common practice is to train the model on a common outdoor benchmark (e.g., KITTI) and evaluate it on real forest datasets (e.g., CanaTree100). However, there is a large domain gap in this setting, which makes the evaluation and deployment difficult. In this paper, we propose a new photorealistic virtual forest dataset and a multimodal transformer-based algorithm for tree detection and instance segmentation. To the best of our knowledge, it is the first time that a multimodal detection and segmentation algorithm is applied to a large-scale forest scenes. We believe that the proposed dataset and method will inspire the simulation, computer vision, education and forestry communities towards a more comprehensive multi-modal understanding.
DOI:	10.1109/VRW62533.2024.00104