Neural Network Approach to Program Synthesis for Tabular Transformation by Example

Data transformation is a laborious and time-consuming task for analysts. Programming by example (PBE) is a technique that can simplify this difficult task for data analysts by automatically generating programs for data transformation. Most of the previously proposed PBE methods are based on search a...

Full description

Saved in:
Bibliographic Details
Published in:IEEE Access Vol. 10; pp. 24864 - 24876
Main Authors: Ujibashi, Yoshifumi, Takasu, Atsuhiro
Format: Journal Article
Language:English
Published: Piscataway IEEE 2022
Institute of Electrical and Electronics Engineers (IEEE)
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:2169-3536, 2169-3536
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data transformation is a laborious and time-consuming task for analysts. Programming by example (PBE) is a technique that can simplify this difficult task for data analysts by automatically generating programs for data transformation. Most of the previously proposed PBE methods are based on search algorithms, but recent improvements in machine learning (ML) have led to its application in PBE research. For example, RobustFill was proposed as an ML-based PBE method for string transformation by using long short-term memory (LSTM) as the sequential encoder-decoder model. However, an ML-based PBE method has not been developed for tabular transformations, which are used frequently in data analysis. Thus, in the present study, we propose an ML-based PBE method for tabular transformations. First, we consider the features of tabular transformations, which are more complex and data intensive than string transformations, and propose a new ML-based PBE method using the state-of-the-art Transformer sequential encoder-decoder model. To our knowledge, this is the first ML-based PBE method for tabular transformations. We also propose two decoding methods comprising multistep beam search and program validation-beam search, which are optimized for program generation, and thus generate correct programs with higher accuracy. Our evaluation results demonstrated that the Transformer-based PBE model performed much better than LSTM-based PBE when applied to tabular transformations. Furthermore, the Transformer-based model with the proposed decoding method performed better than the conventional PBE model using the search-based method.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2022.3155468