Detecting Data Outliers with Machine Learning

Authors

  • Ghalia Nassreddine Faculty of Business, Jinan University of Lebanon, Tripoli, Lebanon.
  • Joumana Younis Faculty of Business, Jinan University of Lebanon, Tripoli, Lebanon.
  • Thaer Falahi Faculty of Business, Jinan University of Lebanon, Tripoli, Lebanon.

DOI:

https://doi.org/10.55145/ajest.2023.02.02.018

Keywords:

Clustering, machine Learning, outlier detection, k_means, anomaly data

Abstract

Anomalies are instances or collections of data that occur very rarely in a dataset and where features differ significantly from most of the data. In the age of technology, data is widely used in all sectors. Thus, anomalies in the data may produce problems if they are not detected. Anomaly detection involves examining specific data points and detecting rare occurrences that seem suspicious because they are different from the established pattern of behaviors. In this study, an approach to anomaly detection is built using a machine learning technique. The clustering distance-based method (k_means) is adopted. First, the anomaly existence is tested using p_value. After that, the anomaly data is detected using the clustering method. The proposed method was tested using real data collected from Kaggle. The results show the good performance of the k_means algorithm in the detection of outlier data.

Downloads

Published

2023-05-16

How to Cite

Nassreddine, G., Younis, J., & Falahi, T. (2023). Detecting Data Outliers with Machine Learning . Al-Salam Journal for Engineering and Technology, 2(2), 152–164. https://doi.org/10.55145/ajest.2023.02.02.018

Issue

Section

Articles