Path Planning Method Combining Depth Learning and Sarsa Algorithm

When the traditional Sarsa(λ) algorithm is applied to the path planning, there are some problems such as slow learning of environmental knowledge and neglecting a lot of useful information. This paper proposes a method to combine the algorithm of Stacking Denoising AutoEncoders, Extract real-time en...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2017 10th International Symposium on Computational Intelligence and Design (ISCID) Ročník 2; s. 77 - 82
Hlavní autoři: Xu, Dong, Fang, Yicheng, Zhang, Ziying, Meng, Yulong
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.12.2017
Témata:
ISSN:2473-3547
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:When the traditional Sarsa(λ) algorithm is applied to the path planning, there are some problems such as slow learning of environmental knowledge and neglecting a lot of useful information. This paper proposes a method to combine the algorithm of Stacking Denoising AutoEncoders, Extract real-time environmental features by stack denoising sparse autoencoders. While eliminating the impact of environmental noise. The position information is obtained by mapping the SOM neural network, whereby the position information yields the R value. Sarsa(λ) updates the Q value based on the R value and carries out the corresponding path planning.At the same time, SOM neural network mapping can effectively avoid the long time iterative operation and output error of other neural networks.This method makes the algorithm more effective to extract the environmental characteristic information, and makes the path planning more accurate and efficient. The simulation experiment takes the agent path planning in 2D and 3D complex environment as the background, the traditional Sarsa(λ) algorithm is compared with the DSAE-Sarsa(λ) that proposed in this paper. Through the path planning performance, the algorithm convergence and the convergence speed, the reward value situation and so on, verifies the ability and superiority of the algorithm that proposed in this paper.
ISSN:2473-3547
DOI:10.1109/ISCID.2017.145