基于改进近端策略优化算法的智能渗透路径研究
TP309; 渗透路径规划是渗透测试的首要步骤,对实现渗透测试的自动化有重大意义.现有渗透路径规划研究多将渗透测试建模为完全可观测的理想过程,难以准确反映部分可观测性的实际渗透测试过程.鉴于强化学习在渗透测试领域的广泛应用,将渗透测试过程建模为部分可观测的马尔可夫决策过程,从而更准确地模拟实际渗透测试过程.在此基础上,针对PPO算法使用全连接层拟合策略函数和价值函数无法提取部分可观测空间有效特征的问题,提出一种改进的PPO算法RPPO,其中策略网络和评估网络均融合全连接层和LSTM网络结构以提升其在未知环境提取特征的能力.同时,给出一种新的目标函数更新方法,以增强算法的鲁棒性和收敛性.实验结果...
Uložené v:
| Vydané v: | 计算机科学 Ročník 51; číslo z2; s. 851 - 856 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Journal Article |
| Jazyk: | Chinese |
| Vydavateľské údaje: |
新疆大学计算机科学与技术学院 乌鲁木齐 830000
16.11.2024
新疆维吾尔自治区多语种信息技术重点实验室 乌鲁木齐 830000 |
| Predmet: | |
| ISSN: | 1002-137X |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | TP309; 渗透路径规划是渗透测试的首要步骤,对实现渗透测试的自动化有重大意义.现有渗透路径规划研究多将渗透测试建模为完全可观测的理想过程,难以准确反映部分可观测性的实际渗透测试过程.鉴于强化学习在渗透测试领域的广泛应用,将渗透测试过程建模为部分可观测的马尔可夫决策过程,从而更准确地模拟实际渗透测试过程.在此基础上,针对PPO算法使用全连接层拟合策略函数和价值函数无法提取部分可观测空间有效特征的问题,提出一种改进的PPO算法RPPO,其中策略网络和评估网络均融合全连接层和LSTM网络结构以提升其在未知环境提取特征的能力.同时,给出一种新的目标函数更新方法,以增强算法的鲁棒性和收敛性.实验结果表明,在不同网络场景中,相较于现有A2C,PPO和NDSPI-DQN算法,RP-PO算法收敛轮次分别缩短了 21.21%,28.64%,22.85%,获得累计奖励分别提升了 66.01%,58.61%,132.64%,更适用于超过50台主机的较大规模网络环境. |
|---|---|
| AbstractList | TP309; 渗透路径规划是渗透测试的首要步骤,对实现渗透测试的自动化有重大意义.现有渗透路径规划研究多将渗透测试建模为完全可观测的理想过程,难以准确反映部分可观测性的实际渗透测试过程.鉴于强化学习在渗透测试领域的广泛应用,将渗透测试过程建模为部分可观测的马尔可夫决策过程,从而更准确地模拟实际渗透测试过程.在此基础上,针对PPO算法使用全连接层拟合策略函数和价值函数无法提取部分可观测空间有效特征的问题,提出一种改进的PPO算法RPPO,其中策略网络和评估网络均融合全连接层和LSTM网络结构以提升其在未知环境提取特征的能力.同时,给出一种新的目标函数更新方法,以增强算法的鲁棒性和收敛性.实验结果表明,在不同网络场景中,相较于现有A2C,PPO和NDSPI-DQN算法,RP-PO算法收敛轮次分别缩短了 21.21%,28.64%,22.85%,获得累计奖励分别提升了 66.01%,58.61%,132.64%,更适用于超过50台主机的较大规模网络环境. |
| Abstract_FL | Penetration path planning is the first step of penetration testing,which is important for the intelligent penetration tes-ting.Existing studies on penetration path planning always model penetration testing as a full observable process,which is difficult to describe the actual penetration testing with partial observability accurately.With the wide application of reinforcement learning in penetration testing,this paper models the penetration testing as a partially observable Markov decision process to simulate the practical penetration testing accurately.In general,the full connection of policy network and evaluation network in PPO cannot extract features effectively in penetration testing with partial observability.This paper proposes an improved PPO algorithm RP-PO,which integrating of full connection and long short term memory(LSTM)in the policy network and evaluation network.In addition,a new objective function updating is designed to improve the robustness and convergence.Experimental results show that,the proposed RPPO converges faster than A2C,PPO and NDSPI-DQN algorithms.Especially,the convergence iterations is reduced by 21.21%,28.64%and 22.85%respectively.Meanwhile RPPO gains more cumulative reward about 66.01%,58.61%and 132.64%,which is more suitable for larger-scale network environments with more than fifty hosts. |
| Author | 王文涛 王紫阳 熊明亮 王佳 |
| AuthorAffiliation | 新疆大学计算机科学与技术学院 乌鲁木齐 830000;新疆维吾尔自治区多语种信息技术重点实验室 乌鲁木齐 830000 |
| AuthorAffiliation_xml | – name: 新疆大学计算机科学与技术学院 乌鲁木齐 830000;新疆维吾尔自治区多语种信息技术重点实验室 乌鲁木齐 830000 |
| Author_FL | WANG Jia WANG Ziyang WANG Wentao XIONG Mingliang |
| Author_FL_xml | – sequence: 1 fullname: WANG Ziyang – sequence: 2 fullname: WANG Jia – sequence: 3 fullname: XIONG Mingliang – sequence: 4 fullname: WANG Wentao |
| Author_xml | – sequence: 1 fullname: 王紫阳 – sequence: 2 fullname: 王佳 – sequence: 3 fullname: 熊明亮 – sequence: 4 fullname: 王文涛 |
| BookMark | eNotjztLw1AYhs9QwVq7-wvcUr_v3JJOIsUbFFwU3MpJkyNGScEgipNgR1EKvXgZdHEQ0Q6Cl6D2zzQ58V8YVJ7h3Z6XZ4oUwlboEzKDUEF0qnIuiIKdwwplSAFQigIpIgC1kNmbk6QcRdsuUCZ5DhbJfHITj-OztPueja6zUcc8DM1j3_Tuxh8XyWnfPA3S5565aqeXcXbymb4Nvo_Ps9dh8tU2t11z_zJNJrTajfzy_5bIxtLiem3Fqq8tr9YW6laEwKRFqdA2osc00qqnNQjhctHUWqB0bHCZLcBBlCCaXCqbc6cKnlKuKz3HcX3OSmT2z3ugQq3CrUbQ2t8L88fGby4Fyo9oLmA_K7tjVA |
| ClassificationCodes | TP309 |
| ContentType | Journal Article |
| Copyright | Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
| Copyright_xml | – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
| DBID | 2B. 4A8 92I 93N PSX TCJ |
| DOI | 10.11896/jsjkx.231200165 |
| DatabaseName | Wanfang Data Journals - Hong Kong WANFANG Data Centre Wanfang Data Journals 万方数据期刊 - 香港版 China Online Journals (COJ) China Online Journals (COJ) |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| DocumentTitle_FL | Intelligent Penetration Path Based on Improved PPO Algorithm |
| EndPage | 856 |
| ExternalDocumentID | jsjkx2024z2116 |
| GroupedDBID | -0Y 2B. 4A8 5XA 5XJ 92H 92I 93N ABJNI ACGFS ALMA_UNASSIGNED_HOLDINGS CCEZO CUBFJ CW9 GROUPED_DOAJ PSX TCJ TGT U1G U5S |
| ID | FETCH-LOGICAL-s1036-225f711d3f129dff055b45cff516870b3750811605c46a744890daabb6d88be43 |
| ISSN | 1002-137X |
| IngestDate | Thu May 29 04:00:14 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Issue | z2 |
| Keywords | 长短期记忆网络 Penetration path planning 渗透测试 渗透路径规划 Reinforcement learning Proximal policy optimization 强化学习 Long and short term memory networks 近端策略优化 Penetration testing |
| Language | Chinese |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-s1036-225f711d3f129dff055b45cff516870b3750811605c46a744890daabb6d88be43 |
| PageCount | 6 |
| ParticipantIDs | wanfang_journals_jsjkx2024z2116 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-11-16 |
| PublicationDateYYYYMMDD | 2024-11-16 |
| PublicationDate_xml | – month: 11 year: 2024 text: 2024-11-16 day: 16 |
| PublicationDecade | 2020 |
| PublicationTitle | 计算机科学 |
| PublicationTitle_FL | Computer Science |
| PublicationYear | 2024 |
| Publisher | 新疆大学计算机科学与技术学院 乌鲁木齐 830000 新疆维吾尔自治区多语种信息技术重点实验室 乌鲁木齐 830000 |
| Publisher_xml | – name: 新疆大学计算机科学与技术学院 乌鲁木齐 830000 – name: 新疆维吾尔自治区多语种信息技术重点实验室 乌鲁木齐 830000 |
| SSID | ssib023646461 ssib051375750 ssib001164759 ssj0057673 |
| Score | 2.397263 |
| Snippet | TP309; 渗透路径规划是渗透测试的首要步骤,对实现渗透测试的自动化有重大意义.现有渗透路径规划研究多将渗透测试建模为完全可观测的理想过程,难以准确反映部分可观测性的... |
| SourceID | wanfang |
| SourceType | Aggregation Database |
| StartPage | 851 |
| Title | 基于改进近端策略优化算法的智能渗透路径研究 |
| URI | https://d.wanfangdata.com.cn/periodical/jsjkx2024z2116 |
| Volume | 51 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Open Access Full Text issn: 1002-137X databaseCode: DOA dateStart: 20210101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.doaj.org/ omitProxy: false ssIdentifier: ssj0057673 providerName: Directory of Open Access Journals |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV05b9RAFB4lgYKGG3GTgqmiBR9zVsjeeEUVUQQpXTS7XgMBLSgbomgrJFIiEFIOjgIaCoQgBRLHCsifya6Xf8F7YztxDglS0FhvZ77xu9Z-b0aeN4RcljEWWdOiohpcVlhsmhWVxLySuE4sHINRz9jDJuTEhJqa0jeGhrvFXpj5e7LVUgsL-sF_dTW0gbNx6-w-3L15U2gAGpwOV3A7XP_J8TTiVNdoGNCI4VVFNBJUA61ppGhYozosCJdGkgYhDWqWGKdaIKE5DbgdXqVa4Q1VNe8KIqol3jD0EYZgYMEsC22ZKqp8Go5bjLJgjZ9TqJplKi0vTsPIjoKfDsqGBAwX5VwZ8cAucLfx1VXLBVqklZ-j2MHm-iL2gMYqRCJkqBzwByVCfw8IQ0F39AiqAstIWdNZGwbRHoMFmkRlxgA6LC-geAx3Emb7O-1fPoeHjjWZQDYoOkM1Ch32pbCVTOXeBYnBxBk4s2-O0eiVwBuzaI1uhKbQg_lDgVa2BTzvjCkf84pSmLJH0vhyqhzH8sK92fPa8UpRSeVdeYKTVXLfHTuVxnWcmfbM3YUrkPbj13bZOR47KpJbBFqy47muGCYHPMm1Ki1n2FQcC9Vtpcp4ToEolS7kID3MFJwia4Jpr8w2w-SaFZ8UgFBXd4hk99e1EtO6VUoFJ4-Sw_kcbjTInr1jZKhz-zg5UpyPMpqHyxPkWu9Nd6P7tL_0fbD-erD-PP2wln5cSZffbfx40Xuykn5a7X9eTl8t9l92B49_9r-t_n70bPB1rfdrMX27lL7_cpLcrEWT1euV_MCSStvFwt4QGxPpurGfQBYdJ4nDeZ3xRpJwV0BcrIPKkIG7wuENJoxkTGknNqZeF7FS9SbzT5GR1v1W8zQZlb5xmHRk7PmGGaZNQ8RcJokvTJ0n2pwhl3IbTOfvnvb0dr-c_SviHDm09TycJyNzsw-bF8jBxvzcnfbsRevOP7HHpCw |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E5%9F%BA%E4%BA%8E%E6%94%B9%E8%BF%9B%E8%BF%91%E7%AB%AF%E7%AD%96%E7%95%A5%E4%BC%98%E5%8C%96%E7%AE%97%E6%B3%95%E7%9A%84%E6%99%BA%E8%83%BD%E6%B8%97%E9%80%8F%E8%B7%AF%E5%BE%84%E7%A0%94%E7%A9%B6&rft.jtitle=%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%A7%91%E5%AD%A6&rft.au=%E7%8E%8B%E7%B4%AB%E9%98%B3&rft.au=%E7%8E%8B%E4%BD%B3&rft.au=%E7%86%8A%E6%98%8E%E4%BA%AE&rft.au=%E7%8E%8B%E6%96%87%E6%B6%9B&rft.date=2024-11-16&rft.pub=%E6%96%B0%E7%96%86%E5%A4%A7%E5%AD%A6%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%A7%91%E5%AD%A6%E4%B8%8E%E6%8A%80%E6%9C%AF%E5%AD%A6%E9%99%A2+%E4%B9%8C%E9%B2%81%E6%9C%A8%E9%BD%90+830000&rft.issn=1002-137X&rft.volume=51&rft.issue=z2&rft.spage=851&rft.epage=856&rft_id=info:doi/10.11896%2Fjsjkx.231200165&rft.externalDocID=jsjkx2024z2116 |
| thumbnail_s | http://cvtisr.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fjsjkx%2Fjsjkx.jpg |