Path Planning Method Combining Depth Learning and Sarsa Algorithm

When the traditional Sarsa(λ) algorithm is applied to the path planning, there are some problems such as slow learning of environmental knowledge and neglecting a lot of useful information. This paper proposes a method to combine the algorithm of Stacking Denoising AutoEncoders, Extract real-time en...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2017 10th International Symposium on Computational Intelligence and Design (ISCID) Ročník 2; s. 77 - 82
Hlavní autori: Xu, Dong, Fang, Yicheng, Zhang, Ziying, Meng, Yulong
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.12.2017
Predmet:
ISSN:2473-3547
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:When the traditional Sarsa(λ) algorithm is applied to the path planning, there are some problems such as slow learning of environmental knowledge and neglecting a lot of useful information. This paper proposes a method to combine the algorithm of Stacking Denoising AutoEncoders, Extract real-time environmental features by stack denoising sparse autoencoders. While eliminating the impact of environmental noise. The position information is obtained by mapping the SOM neural network, whereby the position information yields the R value. Sarsa(λ) updates the Q value based on the R value and carries out the corresponding path planning.At the same time, SOM neural network mapping can effectively avoid the long time iterative operation and output error of other neural networks.This method makes the algorithm more effective to extract the environmental characteristic information, and makes the path planning more accurate and efficient. The simulation experiment takes the agent path planning in 2D and 3D complex environment as the background, the traditional Sarsa(λ) algorithm is compared with the DSAE-Sarsa(λ) that proposed in this paper. Through the path planning performance, the algorithm convergence and the convergence speed, the reward value situation and so on, verifies the ability and superiority of the algorithm that proposed in this paper.
ISSN:2473-3547
DOI:10.1109/ISCID.2017.145