Detecting Data Outliers with Machine Learning
DOI:
https://doi.org/10.55145/ajest.2023.02.02.018Keywords:
Clustering, machine Learning, outlier detection, k_means, anomaly dataAbstract
Anomalies are instances or collections of data that occur very rarely in a dataset and where features differ significantly from most of the data. In the age of technology, data is widely used in all sectors. Thus, anomalies in the data may produce problems if they are not detected. Anomaly detection involves examining specific data points and detecting rare occurrences that seem suspicious because they are different from the established pattern of behaviors. In this study, an approach to anomaly detection is built using a machine learning technique. The clustering distance-based method (k_means) is adopted. First, the anomaly existence is tested using p_value. After that, the anomaly data is detected using the clustering method. The proposed method was tested using real data collected from Kaggle. The results show the good performance of the k_means algorithm in the detection of outlier data.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Ghalia Nassreddine, Joumana Younis, Thaer Falahi
This work is licensed under a Creative Commons Attribution 4.0 International License.