Fraud detection on big tax data using business intelligence, data mining tool: A case of Zambia revenue authority

Thumbnail Image
Mwanza, Memorie
Journal Title
Journal ISSN
Volume Title
University of Zambia
Tax collecting in the developing countries has been associated with a lot of fraud which is a challenge to detect. This is because of the growth in size of data and also the absence of fully automated business processes. Zambia’s tax administration is not an exception to such challenges. Zambia Revenue Authority houses huge sizes of data that need complex mechanisms in order to extract useful tax information. The purpose of the study was to establish the magnitude of the challenges in fraud detection on bulk tax data, to come up with a model which will be used to design a prototype for detection of fraud on tax data for ZRA and further to design the tool which will help to detect fraud on the bulk tax data. Our baseline study showed that currently ZRA uses traditional methods such as Targeted Audits, Random Audits, and whistle blowing to detect fraud on tax data. The baseline also showed that it takes long, above 7 days to detect anomalies or fraud on the bulk tax data. This method is tedious, time consuming and is prone to error. A model which implements data mining, outlier algorithms for fraud detection and is based on, Continuous Monitoring of Distance Based and Distance Based Outlier Queries was then designed. Further, the prototype of ZRA Fraud detector was developed in java using weka Java libraries and NetBeans IDE which implements numerous data mining algorithms. The back end was implemented using MySQL and workbench 6.3 CE a unified visual tool for database architects and developers was used to interact with the Database To implement the prototype, both algorithms were used, underpayments and overpayments according to business rules were detected and were marked as outliers. The results produced by our tool showed improvement in terms of speed of detecting fraud. It took 2 milliseconds to detect anomalies on 1000 bulk tax records as compared to the traditional method which takes above 7 days to detect anomalies on one record. The results of the fraud detection tool also showed the capability of clustering the tax payers into meaningful groups based on the business rules. Keywords: Business Intelligence, Data mining, fraud detection, outlier algorithm.
Masters of Engineering in Information Communication Technology, Security.
Tax evasion , Money laundering--Zambia