Automated feature generation for machine learning application
Saved in:
| Title: | Automated feature generation for machine learning application |
|---|---|
| Patent Number: | 11366,806 |
| Publication Date: | June 21, 2022 |
| Appl. No: | 16/531526 |
| Application Filed: | August 05, 2019 |
| Abstract: | Various implementations include approaches for automating feature generation. The underlying intellectual paradigm to the approach is ensemble learning. That is, each generated feature is an element in an ensemble. Ensemble learning is a very successful paradigm in classical machine learning and dominates real-world predictive analytics projects through tools such as xgboost ( . . . ) or lightgbm ( . . . ). It is also appropriate, because of its ease-of-use compared to other successful paradigms such as deep learning. Moreover, it is possible to generate human-readable SQL code, which is very difficult with deep learning approaches. The various implementations described herein provide for increased scalability and efficiency as compared with conventional approaches. |
| Inventors: | The SQLNet Company GmbH (Leipzig, DE) |
| Assignees: | THE SQLNET COMPANY GMBH (Leipzig, DE) |
| Claim: | 1. A computer-implemented method of generating a feature for an ensemble-learning application comprising a set of base learners, each forming a respective feature, to make a prediction, wherein the generation of a feature is based on given relational sample data, given target values and a given loss function, the method comprising: a) forming a preliminary feature through at least one aggregation function that aggregates relational sample data or learnable weights applied to the sample data; b) calculating aggregation results with the aggregation function; c) using the loss function to calculate pseudo-residuals based on at least the target values, and determining, based upon an optimization criterion formula, how strong the aggregation results relate to the pseudo-residuals calculated with the target values; d) adjusting the preliminary feature by incrementally changing a condition applied to the aggregation function, wherein the condition affects which sample values of the sample data are to be aggregated with the aggregation function; e) calculating aggregation results of the adjusted preliminary feature by adjusting the aggregation results of the preliminary feature before adjustment through at least: e1) determining which sample values of the sample data are affected by the changed condition, e2) adjusting the aggregation results of the preliminary feature before adjustment, to account for a contribution of the sample values affected by the changed condition; f) determining how strong the aggregation results of the adjusted preliminary feature relate to the pseudo-residuals calculated with the target values; and g) repeating processes (d) through (f) for a plurality of incremental changes, and selecting a feature for which the aggregation results relate strongest to the pseudo-residuals calculated with the target values. |
| Claim: | 2. The computer-implemented method of claim 1 , wherein process (g) includes repeating processes (d) through (f) at least twenty times, and each time changing the applied condition such that the aggregation function aggregates a larger share of the sample data or each time a smaller share of the sample data. |
| Claim: | 3. The computer-implemented method of claim 1 , wherein the condition splits the sample data into different groups associated with different learnable weights. |
| Claim: | 4. The computer-implemented method of claim 1 , wherein process (e) comprises: incrementally adjusting the aggregation results from the previous preliminary feature. |
| Claim: | 5. The computer-implemented method of claim 1 , further comprising: (h) outputting the adjusted feature for which the aggregation results relate strongest to the pseudo-residuals calculated with the target values for use in the ensemble learning application. |
| Claim: | 6. The computer-implemented method of claim 1 , further comprising incrementally repeating processes (a) through (g) for a plurality of additional conditions, wherein the adjusted feature becomes the preliminary feature during the repeating. |
| Claim: | 7. The computer-implemented method of claim 6 , wherein calculating the result of the changed aggregation function includes using only a difference between the result from a current aggregation function and the result from a previous aggregation function. |
| Claim: | 8. The computer-implemented method of claim 6 , wherein outputting the adjusted feature is performed for only one of the adjusted features for which the aggregation results relate strongest to the pseudo-residuals. |
| Claim: | 9. The computer-implemented method of claim 1 , wherein quality is defined by the optimization criterion formula. |
| Claim: | 10. The computer-implemented method of claim 1 , wherein each feature is attributed to a single aggregation function. |
| Claim: | 11. The computer-implemented method of claim 1 , further comprising: performing processes (a) through (g) using a distinct preliminary feature; calculating a pseudo-residual for the adjusted feature; and training the distinct preliminary feature to predict an error from the adjusted feature. |
| Claim: | 12. The computer-implemented method of claim 1 , further comprising: identifying which part of the sample data is affected by the presently applied incremental change of the condition, with the help of a match change identification algorithm, wherein process (e) of calculating the aggregation results of the adjusted preliminary feature comprises: adjusting the aggregation result from the previous preliminary feature by calculating how the sample data identified by the match change identification algorithm changes the aggregation results from the previous preliminary feature. |
| Claim: | 13. The computer-implemented method of claim 1 , wherein the condition defines a threshold for additional data associated with the sample data, and only sample data for which the associated additional data is on a specific side of the threshold is to be aggregated with the aggregation function, or wherein the condition defines a categorical value and only sample data to which the categorical value of the condition is attributed is to be aggregated with the aggregation function. |
| Claim: | 14. The computer-implemented method of claim 1 , further comprising: after selecting, in process (g) the feature for which the aggregation results relate strongest to the pseudo-residuals calculated with the target values: (h) adding a further condition to the aggregation function and repeating processes (d) through (g) to determine the feature with the further condition for which the aggregation results relate strongest to the pseudo-residuals calculated with the target values; and repeating process (h) to add still further conditions until a stop algorithm determines that no more conditions are to be applied to the aggregation function. |
| Claim: | 15. The computer-implemented method of claim 1 , further comprising: using an optimization criterion update formula that describes how a change in the aggregation result of the aggregation function changes a previously calculated outcome calculated with the optimization criterion formula, wherein process (f) of determining how strong the aggregation results of the adjusted preliminary features relate to the pseudo-residuals calculated with the target values comprises: determining with the optimization criterion update formula how the change in the aggregation result affects how strong the aggregation results of the adjusted preliminary features relate to the pseudo-residuals calculated with the target values, without using the optimization criterion formula for calculating how strong the aggregation results of the adjusted preliminary features relate to the pseudo-residuals calculated with the target values. |
| Claim: | 16. The computer-implemented method of claim 1 , wherein the relational sample data comprises sample data sets in one or more peripheral tables, wherein the target values are included in a population table, wherein calculated aggregation results are inserted into the population table, wherein the optimization criterion formula uses values from the population table but not from any peripheral table. |
| Claim: | 17. The computer-implemented method of claim 1 , wherein the preliminary feature comprises at least a first and second aggregation functions; the second aggregation function calculates an aggregation result from aggregation results of the first aggregation function; and one or more conditions are applied to at least one of the aggregation functions and processes (d) to (g) are carried out with respect to the one or more conditions. |
| Claim: | 18. A computer-implemented machine-learning method comprising: using one or more features determined and selected with the method of claim 1 to calculate aggregation results from sample data that is at least partially included in one or more peripheral tables; joining the calculated aggregation results to a population table which includes target values; and training a machine-learning algorithm based on the aggregation results and the target values in the population table. |
| Claim: | 19. A system comprising: a computing device having a processor and a memory, the computing device configured to generate a feature for an ensemble learning application comprising a set of base learners, each forming a respective feature, to make a prediction, wherein the generation of a feature is based on given relational sample data, given target values and a given loss function, by performing processes including: a) forming a preliminary feature through at least one aggregation function that aggregates relational sample data or learnable weights applied to the sample data; b) calculating aggregation results with the aggregation function; c) using the loss function to calculate pseudo-residuals based on at least the target values, and determining, based upon an optimization criterion formula, how strong the aggregation results relate to the pseudo-residuals calculated with the target values; d) adjusting the preliminary feature by incrementally changing a condition applied to the aggregation function, wherein the condition affects which sample values of the sample data are to be aggregated with the aggregation function; e) calculating aggregation results of the adjusted preliminary feature by adjusting the aggregation results of the preliminary feature before adjustment, through at least: e1) determining which sample values of the sample data are affected by the changed condition, e2) adjusting the aggregation results of the preliminary feature before adjustment, to account for a contribution of the sample values affected by the changed condition; f) determining how strong the aggregation results of the adjusted preliminary feature relate to the pseudo-residuals calculated with the target values; and g) repeating processes (d) through (f) for a plurality of incremental changes, and selecting a feature for which the aggregation results relate strongest to the pseudo-residuals calculated with the target values. |
| Patent References Cited: | 2017/0177309 June 2017 Bar-Or 2017/0193546 July 2017 Bennett 2017/0286502 October 2017 Bar-Or 2018/0293723 October 2018 Bae et al. 2019/0384762 December 2019 Hill 2020/0034749 January 2020 Kumar |
| Other References: | Lam et al., “One button machine for automating feature engineering in relational databases,” IBM Research, Jun. 1, 2017, 9 pages. cited by applicant European Search Report for corresponding EP Application No. EP20189274 dated Feb. 12, 2021, 10 pages. cited by applicant |
| Primary Examiner: | Aspinwall, Evan |
| Attorney, Agent or Firm: | Hoffman Warnick LLC |
| Accession Number: | edspgr.11366806 |
| Database: | USPTO Patent Grants |
Be the first to leave a comment!