Transformer-based placement heuristic for online 2D strip packing problem

This paper addresses the online 2D Strip Packing Problem (2D-SPP), where rectangular items must be packed sequentially into a strip of fixed width, with the objective of minimizing the total height of packing. We propose a novel reinforcement learning (RL) approach based on a transformer encoder–dec...

Full description

Saved in:

Bibliographic Details
Published in:	Computers & industrial engineering Vol. 210; p. 111464
Main Authors:	Kaleta, Mariusz, Kołodziejczyk, Waldemar, Zoltowska, Izabela
Format:	Journal Article
Language:	English
Published:	Elsevier Ltd 01.12.2025
Subjects:	2D strip packing problem Algorithm Selection Problem MaxRects heuristic Online packing Proximal Policy Optimization (PPO) Reinforcement learning Transformer architecture Proximal Policy Optimization (PPO) Algorithm Selection Problem Reinforcement learning Online packing 2D strip packing problem MaxRects heuristic Transformer architecture
ISSN:	0360-8352
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper addresses the online 2D Strip Packing Problem (2D-SPP), where rectangular items must be packed sequentially into a strip of fixed width, with the objective of minimizing the total height of packing. We propose a novel reinforcement learning (RL) approach based on a transformer encoder–decoder architecture, optimized using Proximal Policy Optimization (PPO). Unlike traditional heuristic methods, which often lack adaptability, our model dynamically selects candidate placements by analyzing spatial relationships between packed items and available free spaces, represented using variants of the MaxRects heuristic. This design enables the agent to generalize across different problem instances with varying lengths. Extensive experiments on a variety of synthetic and real-world datasets, including NGCUT and recursive slicing scenarios, demonstrate that our model consistently outperforms classic heuristics such as MaxRectsBL, MaxRectsBSSF, MaxRectsBAF, and MaxRectsBLSF. In particular, our method achieves up to 73% improvement in packing efficiency on longer episodes and maintains high performance even in generalization settings. The study also introduces the Algorithm Selection Problem to the 2D-SPP domain, showing that transformer-based RL agents can effectively learn heuristic strategies for online combinatorial optimization. •RL agent uses transformer to solve online 2D strip packing efficiently.•Learns dynamic placement via MaxRects-inspired representations.•Outperforms classic MaxRects variants by up to 73% in long episodes.•Trained with PPO, handles variable input without positional embeddings.•Addresses Algorithm Selection Problem in 2D-SPP with learned strategies.
ISSN:	0360-8352
DOI:	10.1016/j.cie.2025.111464