Neural Networks for Insurance Pricing with Frequency and Severity Data: A Benchmark Study from Data Preprocessing to Technical Tariff

Saved in:
Bibliographic Details
Title: Neural Networks for Insurance Pricing with Frequency and Severity Data: A Benchmark Study from Data Preprocessing to Technical Tariff
Authors: Freek Holvoet, Katrien Antonio, Roel Henckaerts
Contributors: Gielis, Simon
Source: North American Actuarial Journal. 29:519-562
Publication Status: Preprint
Publisher Information: Informa UK Limited, 2025.
Publication Year: 2025
Subject Terms: FOS: Computer and information sciences, Computer Science - Machine Learning, predictive performance, 1502 Banking, Finance and Investment, Social Sciences, interpretable machine learning, neural networks, Business, Finance, 1603 Demography, Machine Learning (cs.LG), FOS: Economics and business, model comparison, Business & Economics, 0102 Applied Mathematics, Risk Management (q-fin.RM), pricing, 3502 Banking, finance and investment, 4901 Applied mathematics, property and casualty insurance, embeddings, Quantitative Finance - Risk Management
Description: Insurers usually turn to generalized linear models for modeling claim frequency and severity data. Due to their success in other fields, machine learning techniques are gaining popularity within the actuarial toolbox. Our paper contributes to the literature on frequency-severity insurance pricing with machine learning via deep learning structures. We present a benchmark study on four insurance data sets with frequency and severity targets in the presence of multiple types of input features. We compare in detail the performance of: a generalized linear model on binned input data, a gradient-boosted tree model, a feed-forward neural network (FFNN), and the combined actuarial neural network (CANN). The CANNs combine a baseline prediction established with a GLM and GBM, respectively, with a neural network correction. We explain the data preprocessing steps with specific focus on the multiple types of input features typically present in tabular insurance data sets, such as postal codes, numeric and categorical covariates. Autoencoders are used to embed the categorical variables into the neural network, and we explore their potential advantages in a frequency-severity setting. Model performance is evaluated not only on out-of-sample deviance but also using statistical and calibration performance criteria and managerial tools to get more nuanced insights. Finally, we construct global surrogate models for the neural nets' frequency and severity models. These surrogates enable the translation of the essential insights captured by the FFNNs or CANNs to GLMs. As such, a technical tariff table results that can easily be deployed in practice.
Document Type: Article
Language: English
ISSN: 2325-0453
1092-0277
DOI: 10.1080/10920277.2025.2451860
DOI: 10.48550/arxiv.2310.12671
Access URL: http://arxiv.org/abs/2310.12671
https://lirias.kuleuven.be/handle/20.500.12942/727853
Rights: CC BY
Accession Number: edsair.doi.dedup.....8315926d8acd14a44898e1a7d10b402a
Database: OpenAIRE
Description
Abstract:Insurers usually turn to generalized linear models for modeling claim frequency and severity data. Due to their success in other fields, machine learning techniques are gaining popularity within the actuarial toolbox. Our paper contributes to the literature on frequency-severity insurance pricing with machine learning via deep learning structures. We present a benchmark study on four insurance data sets with frequency and severity targets in the presence of multiple types of input features. We compare in detail the performance of: a generalized linear model on binned input data, a gradient-boosted tree model, a feed-forward neural network (FFNN), and the combined actuarial neural network (CANN). The CANNs combine a baseline prediction established with a GLM and GBM, respectively, with a neural network correction. We explain the data preprocessing steps with specific focus on the multiple types of input features typically present in tabular insurance data sets, such as postal codes, numeric and categorical covariates. Autoencoders are used to embed the categorical variables into the neural network, and we explore their potential advantages in a frequency-severity setting. Model performance is evaluated not only on out-of-sample deviance but also using statistical and calibration performance criteria and managerial tools to get more nuanced insights. Finally, we construct global surrogate models for the neural nets' frequency and severity models. These surrogates enable the translation of the essential insights captured by the FFNNs or CANNs to GLMs. As such, a technical tariff table results that can easily be deployed in practice.
ISSN:23250453
10920277
DOI:10.1080/10920277.2025.2451860