Enhancing Hydrogen Energy Consumption Prediction Based on Stacked Machine Learning Model with Shapley Additive Explanations
Enhancing hydrogen-based energy systems requires accurate hydrogen consumption forecast. This paper compares random forest regressor, multi-layer perceptron, support vector regressor, and gradient boosting regressor to forecast hydrogen consumption, using production and consumption capacity and geog...
Saved in:
| Published in: | Process integration and optimization for sustainability Vol. 9; no. 5; pp. 1847 - 1868 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Singapore
Springer Nature Singapore
01.11.2025
Springer Nature B.V |
| Subjects: | |
| ISSN: | 2509-4238, 2509-4246 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Enhancing hydrogen-based energy systems requires accurate hydrogen consumption forecast. This paper compares random forest regressor, multi-layer perceptron, support vector regressor, and gradient boosting regressor to forecast hydrogen consumption, using production and consumption capacity and geographical coordinates such as production capacity, consumption capacity, latitude, and longitude from the European Hydrogen Observatory, was analyzed using advanced statistical techniques to ensure robust preprocessing and feature selection. It encompasses hydrogen consumption data from various production pathways (e.g., electrolysis and steam reforming), as reported by the European Clean Hydrogen Observatory, although specific production methods are not individually labeled. The study evaluates model performance using standard regression metrics, including mean absolute error, mean squared error, root mean squared error, coefficient of determination, and median absolute error. The random forest regressor led with 0.9789 and low error metrics (mean absolute error = 0.0010, mean squared error = 0.0030, and root mean squared error = 0.0034). However, to further improve prediction accuracy, a stacking ensemble model was developed by combining random forest regressor, multi-layer perceptron, support vector regressor, and gradient boosting regressor as base learners, with Ridge regression serving as the meta-learner. The stacked model significantly outperformed all individual models, achieving a coefficient of determination score of 0.9963, with a reduction in error metrics (mean absolute error = 0.0009, mean squared error = 0.0002, and root mean squared error = 0.0014). To gain deeper insights into feature importance, SHapley Additive Explanations analysis was conducted on the stacked model. The results indicate that latitude, longitude, and production capacity are the most influential factors affecting hydrogen consumption. This paper indicates that the stacked model can effectively anticipate hydrogen usage, helping researchers and policymakers optimize hydrogen distribution and consumption strategies for sustainable energy planning. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2509-4238 2509-4246 |
| DOI: | 10.1007/s41660-025-00539-2 |