An under-Sampled Approach for Handling Skewed Data Distribution using Cluster Disjuncts

An under-Sampled Approach for Handling Skewed Data Distribution using Cluster Disjuncts

Syed Ziaur Rahman

Contact

Dr. G Samuel Vara Prasad Raju

Contact

Dr. Ali Mirza Mahmood

Contact

Andhra University

An under-Sampled Approach for Handling Skewed Data Distribution using Cluster Disjuncts

Article Fingerprint

ReserarchID

CSTSDEGP9YL

An under-Sampled Approach for Handling Skewed Data Distribution using Cluster Disjuncts Banner

AI TAKEAWAY

Connecting with the Eternal Ground

Abstract

In Data mining and Knowledge Discovery hidden and valuable knowledge from the data sources is discovered. The traditional algorithms used for knowledge discovery are bottle necked due to wide range of data sources availability. Class imbalance is a one of the problem arises due to data source which provide unequal class i.e. examples of one class in a training data set vastly outnumber examples of the other class(es). Researchers have rigorously studied several techniques to alleviate the problem of class imbalance, including resampling algorithms, and feature selection approaches to this problem. In this paper, we present a new hybrid frame work dubbed as Majority Under-sampling based on Cluster Disjunct (MAJOR_CD) for learning from skewed training data. This algorithm provides a simpler and faster alternative by using cluster disjunct concept. We conduct experiments using twelve UCI data sets from various application domains using five algorithms for comparison on six evaluation metrics. The empirical study suggests that MAJOR_CD have been believed to be effective in addressing the class imbalance problem.

References

39 Cites in Article

Reference Format

J Wu,S Brubaker,M Mullin,J Rehg (2008). Fast asymmetric learning for cascade face detection.
G Weiss (2004). Mining with rarity: A unifying framework.
N Chawla,N Japkowicz,A Kolcz (2004). Special Issue Learning Imbalanced Datasets.
Wei-Zhen Lu,Dong Wang (2008). Ground-level ozone prediction by support vector machine approach with a cost-sensitive classification scheme.
Y.-M Huang,C.-M Hung,H Jiau (2006). Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem.
D Cieslak,N Chawla,A Striegel (2006). Combating imbalance in network intrusion datasets.
M Mazurowski,P Habas,J Zurada,J Lo,J Baker,G Tourassi (2008). Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance.
Alberto Freitas,Altamiro Costa-Pereira,Pavel Brazdil (2007). Cost-Sensitive Decision Trees Applied to Medical Data.
M Celebi,Hassan Kingravi,Bakhtiyar Uddin,Hitoshi Iyatomi,Y Aslandogan,William Stoecker,Randy Moss (2007). A methodological approach to the classification of dermoscopy images.
Xiang Peng,Irwin King (2007). Robust BMPM training based on second-order cone programming and its application in medical diagnosis.
Rukshan Batuwita,Vasile Palade (2010). FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning.
N Japkowicz,S Stephen (2002). The Class Imbalance Problem: A Systematic Study.
Miroslav Kubat,Robert Holte,Stan Matwin (1997). Learning when negative examples abound.
G Batista,R Prati,M Monard (2003). A Study of the Behavior of Several Methods for 2.
N Japkowicz (2000). Balancing Machine Learning Training Data.
J Quinlan (1986). Induction of Decision Trees.
Taeho Jo,Nathalie Japkowicz (2004). Class imbalances versus small disjuncts.
N Japkowicz (2003). Class Imbalances: Are We Focusing on the Right Issue?.
Ronaldo Prati,Gustavo Batista,Maria Monard (2004). Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior.
G Weiss (2004). Mining with Rarity: A Unifying Framework.
Khadijah Siti,Zaidatun Mohamada,Tasir (2013). Educational data mining: A review.
Hongzhou Sha,Tingwen Liu,Peng Qin,Yong Sun,Qingyun Liu (2013). EPLogCleaner: Improving Data Quality of Enterprise Proxy Logs for Efficient Web Usage Mining.
M Phridviraj,C Gururao (2014). Data Mining – Past, Present and Future – A Typical Survey on Data Streams.
Chumphol Bunkhumpornpat,Krung Sinapiromsaran,Chidchanok Lursinsap (2012). DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique.
Di Matías,Alicia Martino,Pablo Fernández,Federico Iturralde,Lecumberry (2013). Novel classifier scheme for imbalanced problems.
V Garcia,J Sanchez,R Mollineda (2012). On the effectiveness of preprocessing methods when dealing with different levels of class imbalance.
Dolores María,Alberto Pérez-Godoy,Antonio Fernández,María José Rivera,Jesus Del (2010). Analysis of an evolutionary RBFN design algorithm, CO2RBFN, for imbalanced data sets.
Enhong Chen,Yanggang Lin,Hui Xiong,Qiming Luo,Haiping Ma (1993). Exploiting probabilistic topic models to improve text categorization under class imbalance.
N Chawla,K Bowyer,L Hall,W Kegelmeyer (2002). SMOTE: Synthetic Minority Over-sampling Technique.
Der-Chiang Li,Chiao-Wen Liu,Susan Hu (2010). A learning method for the class imbalance problem with medical data sets.
Alberto Fernández,María José Del,Jesus,Francisco Herrera (2010). On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets.
Zheru Chi,Hong Yan,Tuan Pham (1996). Fuzzy Algorithms.
H Ishibuchi,T Yamamoto,T Nakashima (2005). Hybridization of Fuzzy GBML Approaches for Pattern Classification Problems.
J Burez,D Van Den Poel (2009). Handling class imbalance in customer churn prediction.
Che-Chang Hsu,Kuo-Shong Wang,Shih-Hsing Chang (2011). Bayesian decision theory for support vector machines: Imbalance measurement and feature optimization.
Alberto Fernández,María José Del,Jesus,Francisco Herrera (2009). On the influence of an adaptive inference system in fuzzy rule based classification systems for imbalanced data-sets.
Jordan Malof,Maciej Mazurowski,Georgia Tourassi (2012). The effect of class imbalance on case selection for case-based classifiers: An empirical study in the context of medical decision support.
A,Asuncion Newman (2007). UCI Repository of Machine Learning Database (School of Information and Computer Science).
I Witten,E Frank (2005). Data Mining: Practical machine learning tools and techniques.

Download References

Funding

No external funding was declared for this work.

Conflict of Interest

The authors declare no conflict of interest.

Ethical Approval

No ethics committee approval was required for this article type.

Data Availability

Not applicable for this article.

How to Cite This Article

Syed Ziaur Rahman. 2014. \u201cAn under-Sampled Approach for Handling Skewed Data Distribution using Cluster Disjuncts\u201d. Global Journal of Computer Science and Technology - C: Software & Data Engineering GJCST-C Volume 14 (GJCST Volume 14 Issue C7): .

More Citation Formats

Select Citation Style:

Download Citation

Download Article

GJCST Volume 14 Issue C7
Pg. 1- 11

Explore Journals Explore Volume Read This Issue

Journal Specifications

Crossref Journal DOI 10.17406/gjcst

Print ISSN 0975-4350

e-ISSN 0975-4172

Keywords

class imbalance Classification cluster disjunct MAJOR_CD under sampling

Submission ReceivedDecember 11, 2013
Peer Review Double Blind
Handling Editor
Accepted January 5, 2014
Published January 15, 2014

Version of record

v1.2

Issue date

September 25, 2014

Language

Experiance in AR

Explore published articles in an immersive Augmented Reality environment. Our platform converts research papers into interactive 3D books, allowing readers to view and interact with content using AR and VR compatible devices.

View in VR

Read in 3D

Your published article is automatically converted into a realistic 3D book. Flip through pages and read research papers in a more engaging and interactive format.

View in 3D

Article Matrices

Total Score: 113

Country: India

Subject: Global Journal of Computer Science and Technology - C: Software & Data Engineering

Authors: Syed Ziaur Rahman, Dr. G Samuel Vara Prasad Raju, Dr. Ali Mirza Mahmood (PhD/Dr. count: 2)

View Count (all-time): 269

Total Views (Real + Logic): 8951

Total Downloads (simulated): 2178

Publish Date: 2014 09, Thu

Monthly Totals (Real + Logic):

Month 1: 49 views
Month 2: 31 views
Month 3: 47 views
Month 4: 59 views
Month 5: 47 views
Month 6: 39 views
Month 7: 49 views
Month 8: 23 views
Month 9: 46 views
Month 10: 41 views
Month 11: 45 views
Month 12: 26 views
Month 13: 36 views
Month 14: 15 views
Month 15: 20 views
Month 16: 34 views
Month 17: 27 views
Month 18: 15 views
Month 19: 46 views
Month 20: 21 views
Month 21: 40 views
Month 22: 19 views
Month 23: 25 views
Month 24: 32 views
Month 25: 35 views
Month 26: 38 views
Month 27: 43 views
Month 28: 40 views
Month 29: 27 views
Month 30: 26 views
Month 31: 13 views
Month 32: 18 views
Month 33: 13 views
Month 34: 19 views
Month 35: 35 views
Month 36: 30 views
Month 37: 27 views
Month 38: 49 views
Month 39: 24 views
Month 40: 37 views
Month 41: 38 views
Month 42: 44 views
Month 43: 49 views
Month 44: 50 views
Month 45: 30 views
Month 46: 36 views
Month 47: 42 views
Month 48: 41 views
Month 49: 17 views
Month 50: 19 views
Month 51: 39 views
Month 52: 42 views
Month 53: 36 views
Month 54: 23 views
Month 55: 26 views
Month 56: 16 views
Month 57: 32 views
Month 58: 26 views
Month 59: 37 views
Month 60: 47 views
Month 61: 37 views
Month 62: 35 views
Month 63: 28 views
Month 64: 35 views
Month 65: 30 views
Month 66: 39 views
Month 67: 20 views
Month 68: 19 views
Month 69: 24 views
Month 70: 28 views
Month 71: 32 views
Month 72: 18 views
Month 73: 41 views
Month 74: 21 views
Month 75: 37 views
Month 76: 34 views
Month 77: 44 views
Month 78: 24 views
Month 79: 33 views
Month 80: 25 views
Month 81: 44 views
Month 82: 16 views
Month 83: 14 views
Month 84: 28 views
Month 85: 21 views
Month 86: 36 views
Month 87: 38 views
Month 88: 16 views
Month 89: 22 views
Month 90: 41 views
Month 91: 38 views
Month 92: 20 views
Month 93: 42 views
Month 94: 18 views
Month 95: 31 views
Month 96: 40 views
Month 97: 37 views
Month 98: 27 views
Month 99: 32 views
Month 100: 15 views
Month 101: 48 views
Month 102: 29 views
Month 103: 44 views
Month 104: 31 views
Month 105: 17 views
Month 106: 15 views
Month 107: 37 views
Month 108: 35 views
Month 109: 21 views
Month 110: 25 views
Month 111: 26 views
Month 112: 45 views
Month 113: 23 views
Month 114: 27 views
Month 115: 32 views
Month 116: 30 views
Month 117: 32 views
Month 118: 28 views
Month 119: 34 views
Month 120: 19 views
Month 121: 29 views
Month 122: 34 views
Month 123: 32 views
Month 124: 12 views
Month 125: 12 views
Month 126: 40 views
Month 127: 26 views
Month 128: 19 views
Month 129: 42 views
Month 130: 38 views
Month 131: 20 views
Month 132: 35 views
Month 133: 30 views
Month 134: 39 views
Month 135: 54 views
Month 136: 34 views
Month 137: 32 views
Month 138: 52 views
Month 139: 36 views
Month 140: 47 views

Total Views: 8951

Total Downloads: 2178

2026 Trends

Published Article

Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]