Memory Optimization Method for Parallel Computing Framework Based on Distributed Dataset
With the rapid development of scientific computing and artificial intelligence technology, parallel computing in distributed environment has become an important method for solving large-scale theoretical computing and data processing problems. The improvement of memory capacity and the wide applicat...
Saved in:
| Published in: | Ji suan ji gong cheng Vol. 49; no. 4; pp. 43 - 51 |
|---|---|
| Main Author: | |
| Format: | Journal Article |
| Language: | Chinese English |
| Published: |
Editorial Office of Computer Engineering
01.04.2023
|
| Subjects: | |
| ISSN: | 1000-3428 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | With the rapid development of scientific computing and artificial intelligence technology, parallel computing in distributed environment has become an important method for solving large-scale theoretical computing and data processing problems. The improvement of memory capacity and the wide application of iterative algorithms make memory computing technology represented by Spark more mature. However, the current mainstream distributed memory model and computing framework are difficult to consider in terms of ease of use and computing performance. In addition, they have deficiencies in data format definition, memory allocation, and memory utilization efficiency. A parallel computing method is proposed based on distributed datasets, which optimizes memory computing in terms of model theory and system overhead. In theory, through modeling and analysis of the calculation process, the limitations in the expression ability of Spark in the scientific computing environment is solved, and the overhead model of the computing framework supports the subsequent performance optimization. On the system, a frame-level memory optimization method is proposed, which mainly includes modules for the reconstruction of cross-language distributed memory datasets, management of distributed shared memory, and optimization of message delivery process. The experimental results show that the parallel computing framework based on this optimization method significantly improved the memory allocation efficiency of datasets, reduced the serialization/deserialization overhead, and alleviated the memory occupation pressure. The execution time of the application testing was reduced by 69%-92% compared with that of Spark. |
|---|---|
| ISSN: | 1000-3428 |
| DOI: | 10.19678/j.issn.1000-3428.0066025 |