Classification of Crime Data for Crime Control Using C4.5 and Naïve Bayes Techniques

  • Georgina N. Obuandike Department of Mathematical Sciences and IT, Federal University Dutsinma, Katsina state, Nigeria.
  • John Alhasan Department of Computer Science, Federal University of Technology, Minna, Niger State, Nigeria
  • M. B. Abdullahi Department of Computer Science, Federal University of Technology, Minna, Niger State, Nigeria
Keywords: Data Mining, Crime Analysis, Naïve Bayesian, C4.5

Abstract

The analysis of crime data helps to unravel hidden trends that will aid in better understanding of crime pattern and the nature of those who commit such crimes. It also enables appropriate strategies to be put in place to control such crimes. Literature revealed that C4.5 and Nae Bayes are effective classification algorithms that have been successfully applied in classification problems. Percentage split or 10 fold cross validation are two approaches to training and testing classifiers. The two approaches were adopted in the training and testing of Nae Bayes and C4.5 classifiers on crime data collected from selected Nigerian Prisons. In this article, the crime dataset is classified into vulnerable and non-vulnerable for effective crime control strategies. The classification algorithms are applied individually on real crime data and their performance evaluation is analyzed using standard measures such as accuracy, time, Receivers Operating Characteristic (ROC). The classification algorithms are also applied on Breast Cancer and Irish data sets for reliability test. The result showed that C4.5 performed better with higher accuracy on the three dataset against Nae Bayes. The result also revealed that the two classifiers performed better under percentage split approach compared to ten fold validation approaches.

References

Brown, D. E. The regional crime analysis program (RECAP). A framework for mining data to catch criminals. In Systems, Man, and Cybernetics, 3(1), 2848-2853. (2003).

Chen, H., Chung, W., Xu, J.J., Wang, G., Qin, Y., & Chau, M. Crime Data Mining: A General Framework and Some Examples. Computer, 37(4), 50-56 (2004).

Cufoglu, A., Lohi, M., & Madani, K. A comparative study of selected classifiers with classification accuracy in user profiling. In Computer Science and Information Engineering, 3, 708-712 (2009).

Demsar, J. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7(1), 1-30 (2006).

Fayyad, U.M., & Uthurusamy, R. Evolving Data Mining into Solutions for Insights, Communications of ACM, 45(8), 28-31(2002).

Huang, J., & Ling, C. X. Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions Knowledge and Data Engineering 17, 299-310 (2005).

Hongbo, D. Data Mining Techniques and Applications-an Introduction, Cenage Learning EMEA. (2010).

Jain, A. K., Duin, R. P. W., & Mao, J. Statistical pattern recognition: A review. IEEE Transactions on pattern analysis and machine intelligence, 22(1), 4-37 (2000).

Jiawei, H., Micheline, K., & Jian P. Data mining: Concept and Techniques, 3rd edition, New York: Elsevier, (2012).

Julio, P., and Adem, K. Data Mining and Knowledge Discovery in Real Life Applications.Vienna, Austria:I-Tech, (2009).

Kalpana, R., & Bansal, K. L. A Comparative Study of Data Mining Tools, International Journal of Advanced Research in Computer Science and Software Engineering, 4, (2014).

Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. Supervised machine learning: A review of classification techniques. Emerging Application in Computer Engineering, 160(1), 3-24 (2007).

Kuramochi, M., & Karypis, G. Gene classification using expression profiles: a feasibility study. International Journal on Artificial Intelligence Tools, 14, 641-660 (2005).

Megaputer Intelligence, Inc. Crime Pattern Analysis: Megaputer Case Study.http://www.elon.edu/facstaff/mconklin/cis230/cases/crime_pattern_case.pdf (2002).

Milan, K., & Sunila, G. Comparative Study of Data Mining Classification Methods in cardiovascular Disease Prediction. International Journal of Computer Science and Technology, 2 (2), 304-308 (2011).

Naisbitt, J. (1986). Megatrends (6th ed.). New York, Warner Books.

Neelamegam, S., & Ramaraj, E. Classification algorithm in data mining: An overview. International Journal of P2P Network Trends and Technology (IJPTT), 4(8), 369-374 (2013).

Ripley, B. D. Neural networks and related methods for classification. Journal of the Royal Statistical Society. Series B (Methodological), 409-456 (1994).

Seetha, H., & Saravanan, R. On improving the generalization of SVM classifier. In Computer Networks and Intelligent Computing (pp. 11-20). Springer Berlin Heidelberg (2011).

Taheri, S., Yearwood, J., Mammadov, M., & Seifollahi, S. Attribute weighted Naive Bayes classifier using a local optimization. Neural Computing and Applications, 24(5), 995-1002 (2014).

Tan, P. N., Steinbach, M., & Kumar, V. Introduction to Data Mining, Indina: Pearson Addison Wesley, (2006).

Tavares, L. G., Lopes, H. S., & Lima, C. R. E. A comparative study of machine learning methods for detecting promoters in bacterial DNA sequences. In International Conference on Intelligent Computing, 959-966 (2008).

Tsang, I. W., Kwok, J. T., & Cheung, P. M. Core vector machines: Fast SVM training on very large data sets. Journal of Machine Learning Research, 6(4), 363-392 (2005).

Witten, I., & Frank, E. Data mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco: Morgan Kaufmann publishers, (2000).

Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., ... & Zhou, Z. H. Top 10 algorithms in data mining. Knowledge and information systems, 14(1), 1-37 (2008).

Yang, J., Frangi, A. F., Yang, J. Y., Zhang, D., & Jin, Z. KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition. IEEE Transactions on pattern analysis and machine intelligence, 27(2), 230-244 (2005).

ZhaoHui, T., & Jamie, M. Data Mining with SQL Server 2005. Indianapolis, Indiana: Wiley Publishing (2005).

Published
2018-04-12
How to Cite
Obuandike, G. N., Alhasan, J., & Abdullahi, M. B. (2018). International Journal of Mathematical Sciences and Optimization: Theory and Applications, 2017, 139 - 153. Retrieved from http://ijmso.unilag.edu.ng/article/view/5
Section
Articles