Using a data-driven model to predict taxpayers filing false returns: a case of Zambia Revenue Authority.
Date
2024
Authors
Mubanga, Mubanga
Journal Title
Journal ISSN
Volume Title
Publisher
The University of Zambia
Abstract
Tax fraud remains a global issue, with significant economic setbacks for many countries, including Zambia. Traditional methods of tackling this challenge often hinge on labelled datasets, which are scarce due to the slow nature of tax audits and the inherent biases in sample selection. To address this data scarcity and offer a more immediate solution, this study introduces an unsupervised approach utilising K-means clustering alongside anomaly detection techniques. Using an extensive dataset of VAT declarations and associated refund transactions spanning several years, we demonstrate the potential of this method for efficiently identifying potential tax fraud cases. The significance of this paper is twofold: it introduces an innovative approach to a persistent issue and applies it specifically to the context of Zambia. By bypassing the need for exhaustive labelled data, our methodology offers a promising direction for enhancing tax fraud detection capabilities, ensuring a more resilient fiscal landscape for Zambia.
Keywords: K-means clustering, Anomaly detection, VAT, Tax fraud detection
Description
Thesis of Masters’ in Computer Science