Using a data-driven model to predict taxpayers filing false returns: a case of Zambia Revenue Authority.
Date
 2024 
Authors
Mubanga, Mubanga
Journal Title
Journal ISSN
Volume Title
Publisher
 The University of Zambia 
Abstract
 Tax fraud remains a global issue, with significant economic setbacks for many countries, including Zambia. Traditional methods of tackling this challenge often hinge on labelled datasets, which are scarce due to the slow nature of tax audits and the inherent biases in sample selection. To address this data scarcity and offer a more immediate solution, this study introduces an unsupervised approach utilising K-means clustering alongside anomaly detection techniques. Using an extensive dataset of VAT declarations and associated refund transactions spanning several years, we demonstrate the potential of this method for efficiently identifying potential tax fraud cases. The significance of this paper is twofold: it introduces an innovative approach to a persistent issue and applies it specifically to the context of Zambia. By bypassing the need for exhaustive labelled data, our methodology offers a promising direction for enhancing tax fraud detection capabilities, ensuring a more resilient fiscal landscape for Zambia.
Keywords: K-means clustering, Anomaly detection, VAT, Tax fraud detection 
Description
 Thesis of Masters’ in Computer Science