Text Categorization and Machine Learning Methods: Current State Of The Art

Text Categorization and Machine Learning Methods: Current State Of The Art

Durga Bhavani Dasari

Contact

Dr. Venu Gopala Rao. K

Contact

α Jawaharlal Nehru Technological University, Hyderabad

Text Categorization and Machine Learning Methods: Current State Of The Art

Article Fingerprint

ReserarchID

CSTSDE05N02

Text Categorization and Machine Learning Methods: Current State Of The Art Banner

AI TAKEAWAY

Connecting with the Eternal Ground

Abstract

In this informative age, we find many documents are available in digital forms which need classification of the text. For solving this major problem present researchers focused on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of pre classified documents, the characteristics of the categories. The main benefit of the present approach is consisting in the manual definition of a classifier by domain experts where effectiveness, less use of expert work and straightforward portability to different domains are possible. The paper examines the main approaches to text categorization comparing the machine learning paradigm and present state of the art. Various issues pertaining to three different text similarity problems, namely, semantic, conceptual and contextual are also discussed.

References

59 Cites in Article

Reference Format

Y Bao,N Ishii (2002). Combining Multiple kNN Classifiers for Text Categorization by Reducts.
Yaxin Bi,David Bell,Hui Wang,Gongde Guo,Kieran Greer (2004). Combining Multiple Classifiers Using Dempster’s Rule of Combination for Text Categorization.
J Brank,M Grobelnik,N Milic-Frayling,D Mladenic (2002). Interaction of Feature Selection Methods and Linear Classification Models.
Ana Cardoso-Cachopo,Arlindo Oliveira (2003). An Empirical Comparison of Text Categorization Methods.
N Chawla,K Bowyer,L Hall,W Kegelmeyer (2002). SMOTE: Synthetic Minority Over-sampling Technique.
George Forman (2003). A pitfall and solution in multi-class feature selection for text classification.
D Fragoudis,D Meretakis,S Likothanassis (2002). Integrating Feature and Instance Selection for Text Classification.
J Guan,S Zhou (2002). Pruning Training Corpus to Speedup Text Classification.
D Johnson,F Oles,T Zhang,T Goetz (2002). A decision-tree-based symbolic rule induction system for text categorization.
Xuexian Han,Guowei Zu,Wataru Ohyama,Tetsushi Wakabayashi,Fumitaka Kimura (2004). Accuracy Improvement of Automatic Text Classification Based on Feature Transformation and Multi-classifier Combination.
H Ke,M Shaoping (2002). Text categorization based on Concept indexing and principal component analysis.
Athanasios Kehagias,Vassilios Petridis,Vassilis Kaburlasos,Pavlina Fragkou (2003). A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms.
Brett Kessler,Geoffrey Numberg,Hinrich Schütze (1997). Automatic detection of text genre.
Sang-Bum Kim,Hae-Chang Rim,Dongsuk Yook,Heui-Seok Lim (2002). Effective Methods for Improving Naive Bayes Text Classifiers.
Mieczysław Kłopotek,Marcin Woch (2003). Very Large Bayesian Networks in Text Classification.
Edda Leopold,Jörg Kindermann (2002). Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?.
D Lewis,Y Yang,T Rose,F Li (2004). RCV1: A New Benchmark Collection for Text Categorization Research.
Heui Lim (2004). Improving kNN Based Text Classification with Well Estimated Parameters.
R Madsen,S Sigurdsson,L Hansen,J Larsen (2004). Pruning the vocabulary for better context recognition.
E Montanes,J Quevedo,I Diaz (2003). A Wrapper Approach with Support Vector Machines for Text Categorization.
Pio Nardiello,Fabrizio Sebastiani,Alessandro Sperduti (2003). Discretizing Continuous Attributes in AdaBoost for Text Categorization.
Jana Novovičová,Antonín Malík,Pavel Pudil (2004). Feature Selection Using Improved Mutual Information for Text Classification.
Wang Qiang,Wang Xiaolong,Guan Yi (2005). A Study of Semi-discrete Matrix Decomposition for LSI in Automated Text Categorization.
Karl-Michael Schneider (2005). Techniques for Improving the Performance of Naive Bayes for Text Classification.
Fabrizio Sebastiani (2002). Machine learning in automated text categorization.
James Shanahan,Norbert Roma (2003). Improving SVM Text Classification Performance through Threshold Adjustment.
Pascal Soucy,Guy Mineau (2003). Feature Selection Strategies for Text Categorization.
P Sousa,J Pimentao,B Santos,F Moura-Pires (2003). Feature Selection Algorithms to Improve Documents Classification Performance.
Sung-Bae Cho,Jee-Haeng Lee (2003). Learning Neural Network Ensemble for Practical Text Classification.
K Torkkola (2002). Discriminative Features for Text Document Classification.
A Vinciarelli (2004). Noisy text categorization.
Y Yang,J Zhang,B Kisiel (2003). A scalability analysis of classifiers in text categorization.
Y Yang (1999). An evaluation of statistical approaches to text categorization.
Zhenya Zhang,Shuguang Zhang,Enhong Chen,Xufa Wang,Hongmei Cheng (2005). TextCC: New Feed Forward Neural Network for Classifying Documents Instantly.
Shuigeng Zhou,Jihong Guan (2002). Evaluation and Construction of Training Corpuses for Text Classification: A Preliminary Study.
Thanaruk Verayuth Lertnattee,Theeramunkong (2004). Parallel Text Categorization for Multi-dimensional Data.
Wang Qiang,Wang Xiaolong,Guan Yi (2005). A Study of Semi-discrete Matrix Decomposition for LSI in Automated Text Categorization.
Guowei Zu,Wataru Ohyama,Tetsushi Wakabayashi,Fumitaka Kimura (2003). Accuracy improvement of automatic text classification based on feature transformation.
Kevin Knight (1999). Mining online text.
M Pazienza (1997). Information Extraction.
Riloff (1995). Little words can make a big difference for text classification.
Harold Borko,Myrna Bernick (1963). Automatic Document Classification.
Dieter Merkl (1998). Text classification with self-organizing maps: Some lessons learned.
C Manning,H Sch¨utze (1999). Foundations of Statistical Natural Language Processing.
Oana Frunza,Diana Inkpen,Thomas Tran (2011). A Machine Learning Approach for Identifying Disease-Treatment Relations in Short Texts.
H Al-Mubaid,S Umair (2006). A New Text Categorization Technique Using Distributional Clustering and Learning Logic.
A Sun,E-P Lim,W-K Ng,J Srivastava (2004). Blocking reduction strategies in hierarchical text classification.
K Sarinnapakorn,M Kubat (2007). Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study.
D Bell,J Guan,Y Bi (2005). On combining classifier mass functions for text categorization.
Padmini Srinivasan,T Rindflesch (2002). Text mining: Generating hypotheses from MEDLINE.
Hisham Al-Mubaid,Klaus Truemper (2006). Learning to Find Context Based Spelling Errors.
Giovanni Felici,Klaus Truemper (2002). A MINSAT Approach for Learning in Logic Domains.
L Baker,Andrew Mccallum (1998). Distributional clustering of words for text classification.
Ron Bekkerman,Ran El-Yaniv,Naftali Tishby,Yoad Winter (2003). On feature distributional clustering for text categorization.
Inderjit Dhillon,Subramanyam Mallela,Dharmendra Modha (2003). Information-theoretic co-clustering.
Fernando Pereira,Naftali Tishby,Lillian Lee (1993). Distributional clustering of English words.
Noam Slonim,Naftali Tishby (2001). Document clustering using word clusters via the information bottleneck method.
Susan Dumais,Hao Chen (2000). Hierarchical classification of Web content.
A Sun,E. -P Lim (2001). Hierarchical Text Classification and Evaluation.

Download References

Funding

No external funding was declared for this work.

Conflict of Interest

The authors declare no conflict of interest.

Ethical Approval

No ethics committee approval was required for this article type.

Data Availability

Not applicable for this article.

How to Cite This Article

Durga Bhavani Dasari. 2012. \u201cText Categorization and Machine Learning Methods: Current State Of The Art\u201d. Global Journal of Computer Science and Technology - C: Software & Data Engineering GJCST-C Volume 12 (GJCST Volume 12 Issue C11): .

More Citation Formats

Select Citation Style:

Download Citation

Download Article

GJCST Volume 12 Issue C11
Pg. 37- 46

Explore Journals Explore Volume Read This Issue

Journal Specifications

Crossref Journal DOI 10.17406/gjcst

Print ISSN 0975-4350

e-ISSN 0975-4172

Keywords

text categorization Text Classification Text Clustering Text Mining

Submission ReceivedDecember 9, 2011
Peer Review Double Blind
Handling Editor
Accepted January 1, 2012
Published January 15, 2012

Version of record

v1.2

Issue date

July 17, 2012

Language

Experiance in AR

Explore published articles in an immersive Augmented Reality environment. Our platform converts research papers into interactive 3D books, allowing readers to view and interact with content using AR and VR compatible devices.

View in VR

Read in 3D

Your published article is automatically converted into a realistic 3D book. Flip through pages and read research papers in a more engaging and interactive format.

View in 3D

Article Matrices

Total Score: 107

Country: India

Subject: Global Journal of Computer Science and Technology - C: Software & Data Engineering

Authors: Durga Bhavani Dasari, Dr. Venu Gopala Rao. K (PhD/Dr. count: 1)

View Count (all-time): 250

Total Views (Real + Logic): 10227

Total Downloads (simulated): 2809

Publish Date: 2012 07, Tue

Monthly Totals (Real + Logic):

Month 1: 47 views
Month 2: 36 views
Month 3: 35 views
Month 4: 53 views
Month 5: 45 views
Month 6: 37 views
Month 7: 33 views
Month 8: 47 views
Month 9: 32 views
Month 10: 39 views
Month 11: 42 views
Month 12: 36 views
Month 13: 36 views
Month 14: 22 views
Month 15: 25 views
Month 16: 46 views
Month 17: 24 views
Month 18: 22 views
Month 19: 41 views
Month 20: 32 views
Month 21: 39 views
Month 22: 24 views
Month 23: 38 views
Month 24: 31 views
Month 25: 25 views
Month 26: 35 views
Month 27: 25 views
Month 28: 28 views
Month 29: 43 views
Month 30: 37 views
Month 31: 29 views
Month 32: 34 views
Month 33: 42 views
Month 34: 31 views
Month 35: 40 views
Month 36: 20 views
Month 37: 28 views
Month 38: 18 views
Month 39: 41 views
Month 40: 25 views
Month 41: 21 views
Month 42: 21 views
Month 43: 35 views
Month 44: 22 views
Month 45: 20 views
Month 46: 42 views
Month 47: 37 views
Month 48: 40 views
Month 49: 40 views
Month 50: 22 views
Month 51: 15 views
Month 52: 38 views
Month 53: 35 views
Month 54: 25 views
Month 55: 32 views
Month 56: 32 views
Month 57: 37 views
Month 58: 33 views
Month 59: 28 views
Month 60: 30 views
Month 61: 35 views
Month 62: 19 views
Month 63: 25 views
Month 64: 30 views
Month 65: 46 views
Month 66: 11 views
Month 67: 28 views
Month 68: 38 views
Month 69: 40 views
Month 70: 20 views
Month 71: 29 views
Month 72: 34 views
Month 73: 39 views
Month 74: 13 views
Month 75: 20 views
Month 76: 19 views
Month 77: 23 views
Month 78: 38 views
Month 79: 31 views
Month 80: 26 views
Month 81: 28 views
Month 82: 36 views
Month 83: 19 views
Month 84: 27 views
Month 85: 39 views
Month 86: 20 views
Month 87: 30 views
Month 88: 25 views
Month 89: 20 views
Month 90: 35 views
Month 91: 30 views
Month 92: 15 views
Month 93: 38 views
Month 94: 28 views
Month 95: 20 views
Month 96: 37 views
Month 97: 32 views
Month 98: 30 views
Month 99: 28 views
Month 100: 32 views
Month 101: 35 views
Month 102: 15 views
Month 103: 41 views
Month 104: 29 views
Month 105: 23 views
Month 106: 25 views
Month 107: 30 views
Month 108: 40 views
Month 109: 39 views
Month 110: 26 views
Month 111: 19 views
Month 112: 20 views
Month 113: 35 views
Month 114: 45 views
Month 115: 29 views
Month 116: 31 views
Month 117: 20 views
Month 118: 20 views
Month 119: 40 views
Month 120: 27 views
Month 121: 33 views
Month 122: 23 views
Month 123: 34 views
Month 124: 14 views
Month 125: 28 views
Month 126: 18 views
Month 127: 27 views
Month 128: 25 views
Month 129: 39 views
Month 130: 47 views
Month 131: 28 views
Month 132: 15 views
Month 133: 26 views
Month 134: 40 views
Month 135: 22 views
Month 136: 50 views
Month 137: 46 views
Month 138: 25 views
Month 139: 16 views
Month 140: 16 views
Month 141: 32 views
Month 142: 22 views
Month 143: 37 views
Month 144: 31 views
Month 145: 38 views
Month 146: 26 views
Month 147: 21 views
Month 148: 31 views
Month 149: 12 views
Month 150: 33 views
Month 151: 48 views
Month 152: 45 views
Month 153: 29 views
Month 154: 30 views
Month 155: 25 views
Month 156: 28 views
Month 157: 48 views
Month 158: 24 views
Month 159: 42 views
Month 160: 22 views
Month 161: 31 views
Month 162: 22 views
Month 163: 30 views
Month 164: 53 views
Month 165: 34 views
Month 166: 57 views

Total Views: 10227

Total Downloads: 2809

2026 Trends

Published Article

Our website is actively being updated, and changes may occur frequently. Please clear your browser cache if needed. For feedback or error reporting, please email [email protected]