Memory Optimization Method for Parallel Computing Framework Based on Distributed Dataset

With the rapid development of scientific computing and artificial intelligence technology, parallel computing in distributed environment has become an important method for solving large-scale theoretical computing and data processing problems. The improvement of memory capacity and the wide applicat...

Full description

Saved in:
Bibliographic Details
Published in:Ji suan ji gong cheng Vol. 49; no. 4; pp. 43 - 51
Main Author: XIA Libin, LIU Xiaoyu, JIANG Xiaowei, SUN Gongxing
Format: Journal Article
Language:Chinese
English
Published: Editorial Office of Computer Engineering 01.04.2023
Subjects:
ISSN:1000-3428
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the rapid development of scientific computing and artificial intelligence technology, parallel computing in distributed environment has become an important method for solving large-scale theoretical computing and data processing problems. The improvement of memory capacity and the wide application of iterative algorithms make memory computing technology represented by Spark more mature. However, the current mainstream distributed memory model and computing framework are difficult to consider in terms of ease of use and computing performance. In addition, they have deficiencies in data format definition, memory allocation, and memory utilization efficiency. A parallel computing method is proposed based on distributed datasets, which optimizes memory computing in terms of model theory and system overhead. In theory, through modeling and analysis of the calculation process, the limitations in the expression ability of Spark in the scientific computing environment is solved, and the overhead model of the computing framework supports the subsequent performance optimization. On the system, a frame-level memory optimization method is proposed, which mainly includes modules for the reconstruction of cross-language distributed memory datasets, management of distributed shared memory, and optimization of message delivery process. The experimental results show that the parallel computing framework based on this optimization method significantly improved the memory allocation efficiency of datasets, reduced the serialization/deserialization overhead, and alleviated the memory occupation pressure. The execution time of the application testing was reduced by 69%-92% compared with that of Spark.
ISSN:1000-3428
DOI:10.19678/j.issn.1000-3428.0066025