An under-Sampled Approach for Handling Skewed Data Distribution using Cluster Disjuncts

1
Syed Ziaur Rahman
Syed Ziaur Rahman
2
Dr. G Samuel Vara Prasad Raju
Dr. G Samuel Vara Prasad Raju
3
Dr. Ali Mirza Mahmood
Dr. Ali Mirza Mahmood
1 Andhra University

Send Message

To: Author

GJCST Volume 14 Issue C7

Article Fingerprint

ReserarchID

CSTSDEGP9YL

An under-Sampled Approach for Handling Skewed Data Distribution using Cluster Disjuncts Banner
  • English
  • Afrikaans
  • Albanian
  • Amharic
  • Arabic
  • Armenian
  • Azerbaijani
  • Basque
  • Belarusian
  • Bengali
  • Bosnian
  • Bulgarian
  • Catalan
  • Cebuano
  • Chichewa
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Corsican
  • Croatian
  • Czech
  • Danish
  • Dutch
  • Esperanto
  • Estonian
  • Filipino
  • Finnish
  • French
  • Frisian
  • Galician
  • Georgian
  • German
  • Greek
  • Gujarati
  • Haitian Creole
  • Hausa
  • Hawaiian
  • Hebrew
  • Hindi
  • Hmong
  • Hungarian
  • Icelandic
  • Igbo
  • Indonesian
  • Irish
  • Italian
  • Japanese
  • Javanese
  • Kannada
  • Kazakh
  • Khmer
  • Korean
  • Kurdish (Kurmanji)
  • Kyrgyz
  • Lao
  • Latin
  • Latvian
  • Lithuanian
  • Luxembourgish
  • Macedonian
  • Malagasy
  • Malay
  • Malayalam
  • Maltese
  • Maori
  • Marathi
  • Mongolian
  • Myanmar (Burmese)
  • Nepali
  • Norwegian
  • Pashto
  • Persian
  • Polish
  • Portuguese
  • Punjabi
  • Romanian
  • Russian
  • Samoan
  • Scots Gaelic
  • Serbian
  • Sesotho
  • Shona
  • Sindhi
  • Sinhala
  • Slovak
  • Slovenian
  • Somali
  • Spanish
  • Sundanese
  • Swahili
  • Swedish
  • Tajik
  • Tamil
  • Telugu
  • Thai
  • Turkish
  • Ukrainian
  • Urdu
  • Uzbek
  • Vietnamese
  • Welsh
  • Xhosa
  • Yiddish
  • Yoruba
  • Zulu

In Data mining and Knowledge Discovery hidden and valuable knowledge from the data sources is discovered. The traditional algorithms used for knowledge discovery are bottle necked due to wide range of data sources availability. Class imbalance is a one of the problem arises due to data source which provide unequal class i.e. examples of one class in a training data set vastly outnumber examples of the other class(es). Researchers have rigorously studied several techniques to alleviate the problem of class imbalance, including resampling algorithms, and feature selection approaches to this problem. In this paper, we present a new hybrid frame work dubbed as Majority Under-sampling based on Cluster Disjunct (MAJOR_CD) for learning from skewed training data. This algorithm provides a simpler and faster alternative by using cluster disjunct concept. We conduct experiments using twelve UCI data sets from various application domains using five algorithms for comparison on six evaluation metrics. The empirical study suggests that MAJOR_CD have been believed to be effective in addressing the class imbalance problem.

39 Cites in Articles

References

  1. J Wu,S Brubaker,M Mullin,J Rehg (2008). Fast asymmetric learning for cascade face detection.
  2. G Weiss (2004). Mining with rarity: A unifying framework.
  3. N Chawla,N Japkowicz,A Kolcz (2004). Special Issue Learning Imbalanced Datasets.
  4. Wei-Zhen Lu,Dong Wang (2008). Ground-level ozone prediction by support vector machine approach with a cost-sensitive classification scheme.
  5. Y.-M Huang,C.-M Hung,H Jiau (2006). Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem.
  6. D Cieslak,N Chawla,A Striegel (2006). Combating imbalance in network intrusion datasets.
  7. M Mazurowski,P Habas,J Zurada,J Lo,J Baker,G Tourassi (2008). Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance.
  8. Alberto Freitas,Altamiro Costa-Pereira,Pavel Brazdil (2007). Cost-Sensitive Decision Trees Applied to Medical Data.
  9. M Celebi,Hassan Kingravi,Bakhtiyar Uddin,Hitoshi Iyatomi,Y Aslandogan,William Stoecker,Randy Moss (2007). A methodological approach to the classification of dermoscopy images.
  10. Xiang Peng,Irwin King (2007). Robust BMPM training based on second-order cone programming and its application in medical diagnosis.
  11. Rukshan Batuwita,Vasile Palade (2010). FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning.
  12. N Japkowicz,S Stephen (2002). The Class Imbalance Problem: A Systematic Study.
  13. Miroslav Kubat,Robert Holte,Stan Matwin (1997). Learning when negative examples abound.
  14. G Batista,R Prati,M Monard (2003). A Study of the Behavior of Several Methods for 2.
  15. N Japkowicz (2000). Balancing Machine Learning Training Data.
  16. J Quinlan (1986). Induction of Decision Trees.
  17. Taeho Jo,Nathalie Japkowicz (2004). Class imbalances versus small disjuncts.
  18. N Japkowicz (2003). Class Imbalances: Are We Focusing on the Right Issue?.
  19. Ronaldo Prati,Gustavo Batista,Maria Monard (2004). Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior.
  20. G Weiss (2004). Mining with Rarity: A Unifying Framework.
  21. Khadijah Siti,Zaidatun Mohamada,Tasir (2013). Educational data mining: A review.
  22. Hongzhou Sha,Tingwen Liu,Peng Qin,Yong Sun,Qingyun Liu (2013). EPLogCleaner: Improving Data Quality of Enterprise Proxy Logs for Efficient Web Usage Mining.
  23. M Phridviraj,C Gururao (2014). Data Mining – Past, Present and Future – A Typical Survey on Data Streams.
  24. Chumphol Bunkhumpornpat,Krung Sinapiromsaran,Chidchanok Lursinsap (2012). DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique.
  25. Di Matías,Alicia Martino,Pablo Fernández,Federico Iturralde,Lecumberry (2013). Novel classifier scheme for imbalanced problems.
  26. V Garcia,J Sanchez,R Mollineda (2012). On the effectiveness of preprocessing methods when dealing with different levels of class imbalance.
  27. Dolores María,Alberto Pérez-Godoy,Antonio Fernández,María José Rivera,Jesus Del (2010). Analysis of an evolutionary RBFN design algorithm, CO2RBFN, for imbalanced data sets.
  28. Enhong Chen,Yanggang Lin,Hui Xiong,Qiming Luo,Haiping Ma (1993). Exploiting probabilistic topic models to improve text categorization under class imbalance.
  29. N Chawla,K Bowyer,L Hall,W Kegelmeyer (2002). SMOTE: Synthetic Minority Over-sampling Technique.
  30. Der-Chiang Li,Chiao-Wen Liu,Susan Hu (2010). A learning method for the class imbalance problem with medical data sets.
  31. Alberto Fernández,María José Del,Jesus,Francisco Herrera (2010). On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets.
  32. Zheru Chi,Hong Yan,Tuan Pham (1996). Fuzzy Algorithms.
  33. H Ishibuchi,T Yamamoto,T Nakashima (2005). Hybridization of Fuzzy GBML Approaches for Pattern Classification Problems.
  34. J Burez,D Van Den Poel (2009). Handling class imbalance in customer churn prediction.
  35. Che-Chang Hsu,Kuo-Shong Wang,Shih-Hsing Chang (2011). Bayesian decision theory for support vector machines: Imbalance measurement and feature optimization.
  36. Alberto Fernández,María José Del,Jesus,Francisco Herrera (2009). On the influence of an adaptive inference system in fuzzy rule based classification systems for imbalanced data-sets.
  37. Jordan Malof,Maciej Mazurowski,Georgia Tourassi (2012). The effect of class imbalance on case selection for case-based classifiers: An empirical study in the context of medical decision support.
  38. A,Asuncion Newman (2007). UCI Repository of Machine Learning Database (School of Information and Computer Science).
  39. I Witten,E Frank (2005). Data Mining: Practical machine learning tools and techniques.

Funding

No external funding was declared for this work.

Conflict of Interest

The authors declare no conflict of interest.

Ethical Approval

No ethics committee approval was required for this article type.

Data Availability

Not applicable for this article.

Syed Ziaur Rahman. 2014. \u201cAn under-Sampled Approach for Handling Skewed Data Distribution using Cluster Disjuncts\u201d. Global Journal of Computer Science and Technology - C: Software & Data Engineering GJCST-C Volume 14 (GJCST Volume 14 Issue C7): .

Download Citation

Journal Specifications

Crossref Journal DOI 10.17406/gjcst

Print ISSN 0975-4350

e-ISSN 0975-4172

Classification
Not Found
Version of record

v1.2

Issue date

September 25, 2014

Language

English

Experiance in AR

The methods for personal identification and authentication are no exception.

Read in 3D

The methods for personal identification and authentication are no exception.

Article Matrices
Total Views: 8885
Total Downloads: 2312
2026 Trends
Research Identity (RIN)
Related Research

Published Article

In Data mining and Knowledge Discovery hidden and valuable knowledge from the data sources is discovered. The traditional algorithms used for knowledge discovery are bottle necked due to wide range of data sources availability. Class imbalance is a one of the problem arises due to data source which provide unequal class i.e. examples of one class in a training data set vastly outnumber examples of the other class(es). Researchers have rigorously studied several techniques to alleviate the problem of class imbalance, including resampling algorithms, and feature selection approaches to this problem. In this paper, we present a new hybrid frame work dubbed as Majority Under-sampling based on Cluster Disjunct (MAJOR_CD) for learning from skewed training data. This algorithm provides a simpler and faster alternative by using cluster disjunct concept. We conduct experiments using twelve UCI data sets from various application domains using five algorithms for comparison on six evaluation metrics. The empirical study suggests that MAJOR_CD have been believed to be effective in addressing the class imbalance problem.

Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]
×

This Page is Under Development

We are currently updating this article page for a better experience.

Request Access

Please fill out the form below to request access to this research paper. Your request will be reviewed by the editorial or author team.
X

Quote and Order Details

Contact Person

Invoice Address

Notes or Comments

This is the heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

High-quality academic research articles on global topics and journals.

An under-Sampled Approach for Handling Skewed Data Distribution using Cluster Disjuncts

Syed Ziaur Rahman
Syed Ziaur Rahman Andhra University
Dr. G Samuel Vara Prasad Raju
Dr. G Samuel Vara Prasad Raju
Dr. Ali Mirza Mahmood
Dr. Ali Mirza Mahmood

Research Journals