Detecting Data Outliers with Machine Learning

Ghalia Nassreddine; Joumana Younis; Thaer Falahi

doi:10.55145/ajest.2023.02.02.018

Authors

Ghalia Nassreddine Faculty of Business, Jinan University of Lebanon, Tripoli, Lebanon.
Joumana Younis Faculty of Business, Jinan University of Lebanon, Tripoli, Lebanon.
Thaer Falahi Faculty of Business, Jinan University of Lebanon, Tripoli, Lebanon.

DOI:

https://doi.org/10.55145/ajest.2023.02.02.018

Keywords:

Clustering, machine Learning, outlier detection, k_means, anomaly data

Abstract

Anomalies are instances or collections of data that occur very rarely in a dataset and where features differ significantly from most of the data. In the age of technology, data is widely used in all sectors. Thus, anomalies in the data may produce problems if they are not detected. Anomaly detection involves examining specific data points and detecting rare occurrences that seem suspicious because they are different from the established pattern of behaviors. In this study, an approach to anomaly detection is built using a machine learning technique. The clustering distance-based method (k_means) is adopted. First, the anomaly existence is tested using p_value. After that, the anomaly data is detected using the clustering method. The proposed method was tested using real data collected from Kaggle. The results show the good performance of the k_means algorithm in the detection of outlier data.

Detecting Data Outliers with Machine Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Quickview