Transformer-based placement heuristic for online 2D strip packing problem
This paper addresses the online 2D Strip Packing Problem (2D-SPP), where rectangular items must be packed sequentially into a strip of fixed width, with the objective of minimizing the total height of packing. We propose a novel reinforcement learning (RL) approach based on a transformer encoder–dec...
Saved in:
| Published in: | Computers & industrial engineering Vol. 210; p. 111464 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Ltd
01.12.2025
|
| Subjects: | |
| ISSN: | 0360-8352 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This paper addresses the online 2D Strip Packing Problem (2D-SPP), where rectangular items must be packed sequentially into a strip of fixed width, with the objective of minimizing the total height of packing. We propose a novel reinforcement learning (RL) approach based on a transformer encoder–decoder architecture, optimized using Proximal Policy Optimization (PPO). Unlike traditional heuristic methods, which often lack adaptability, our model dynamically selects candidate placements by analyzing spatial relationships between packed items and available free spaces, represented using variants of the MaxRects heuristic. This design enables the agent to generalize across different problem instances with varying lengths. Extensive experiments on a variety of synthetic and real-world datasets, including NGCUT and recursive slicing scenarios, demonstrate that our model consistently outperforms classic heuristics such as MaxRectsBL, MaxRectsBSSF, MaxRectsBAF, and MaxRectsBLSF. In particular, our method achieves up to 73% improvement in packing efficiency on longer episodes and maintains high performance even in generalization settings. The study also introduces the Algorithm Selection Problem to the 2D-SPP domain, showing that transformer-based RL agents can effectively learn heuristic strategies for online combinatorial optimization.
•RL agent uses transformer to solve online 2D strip packing efficiently.•Learns dynamic placement via MaxRects-inspired representations.•Outperforms classic MaxRects variants by up to 73% in long episodes.•Trained with PPO, handles variable input without positional embeddings.•Addresses Algorithm Selection Problem in 2D-SPP with learned strategies. |
|---|---|
| ISSN: | 0360-8352 |
| DOI: | 10.1016/j.cie.2025.111464 |