Value-added tax fraud detection with scalable anomaly detection techniques

The tax fraud detection domain is characterized by very few labelled data (known fraud/legal cases) that are not representative for the population due to sample selection bias. We use unsupervised anomaly detection (AD) techniques, which are uncommon in tax fraud detection research, to deal with the...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Applied soft computing Ročník 86; s. 105895
Hlavní autoři: Vanhoeyveld, Jellis, Martens, David, Peeters, Bruno
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.01.2020
Témata:
ISSN:1568-4946, 1872-9681
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The tax fraud detection domain is characterized by very few labelled data (known fraud/legal cases) that are not representative for the population due to sample selection bias. We use unsupervised anomaly detection (AD) techniques, which are uncommon in tax fraud detection research, to deal with these domain issues. We analyse a unique dataset containing the VAT declarations and client listings of all Belgian VAT numbers pertaining to ten sectors. Our methodology consists in applying AD methods to firms belonging to the same sector and enables an efficient auditing strategy that can be adopted by tax authorities worldwide. The high lifts and hit rates observed in most sectors demonstrate the success of this approach. Sectoral differences exist due to varying market conditions and legal requirements across sectors and we show that the optimal AD method is sector dependent. We focus on three methodological problems that show issues in the related literature. (1) Can we design suitable input features? We develop new fraud indicators from specific fields of the VAT form and client listings and show the predictive value of the combination of these features. (2) Can we design fast algorithms to deal with the large data sizes that can occur in the tax domain? New methods are developed and we demonstrate their scalability both theoretically as well as empirically. (3) How should fraud detection performance be assessed? A new evaluation methodology is proposed that provides reliable performance indications and guarantees that fraud cases are effectively detected by the proposed methods. •Unsupervised anomaly detection shows high predictive power for VAT fraud detection.•Individual sector analysis reveals sectoral differences.•New value-added tax fraud indicators are proposed that successfully detect fraud.•Fast anomaly detection algorithms are developed and their scalability is shown.•Suitable evaluation methodology that guarantees fraudsters are effectively detected.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2019.105895